By Gary Angel
The big data revolution is here. Everything is being tracked, analyzed, optimized and personalized at incredible levels of detail to enhance ROI, efficiency and competitive advantage. Except it isn’t.
At most organizations, a mountain of data is collected, but little is analyzed and even less is used productively. It turns out that dealing with big data (like most things) is hard. It takes practical experience and learning. It takes investment. It often takes repeated failure. And, despite all that effort, it sometimes turns out to have little impact.
But big data and analytics are real. They do matter, and they can be done in ways that can help organizations solve their biggest challenges. If you want to do them well, you have to look at how and why organizations have succeeded with big data programs and—perhaps more importantly—how and why they’ve failed.
It turns out that a single factor, more often than not, determines whether an organization succeeds with analytics or fails. And here’s the good news: It’s not some impossibly squishy thing such as a “culture of analytics” or an “executive buy-in.”
Before you effectively analyze anything, you need data and you need technology. People who think about and practice analytics often describe the process this way:
· Marshal the best technologies.
· Manage relevant data sources.
· Perform analytics
· Gain insights.
· Drive decisions.
If that’s the process, there’s a strong and perfectly reasonable tendency to focus on obtaining high-quality data and putting it on the right technology as your first steps. But if you do that, there’s also a good chance you’ll fail. Spectacularly. Repeatedly.
And therein lies a simple but challenging paradox: Even though effective analytics depends on data and technology, you’ll never marshal these factors successfully for analysis unless you’ve first decided on which business problems to tackle.
Business First, Then Data
Given that your data and technology must be in place before you can tackle analytics, why is it so important to focus on the business problem before you marshal your data and technology?
You may think that making a technology decision on analytics is easy and doesn’t require much consultation with the business, or any predetermination of the analytics problems to be solved. Just get a commercial instance of Hadoop and you’ll be all right.
That’s true: You can solve almost any problem in analytics with a robust Hadoop instance. You just can’t solve it as easily as you can with some other types of systems.
If you have a non-real-time, true big data problem (one that requires analysis of sequence, time or patterns of detail-level data), then Hadoop is a required technology. But for many traditional analytics problems, you’d be much better off parking your data on a warehouse appliance, a columnar database or an in-memory system.
So, unless you know the general class of problems you’re trying to solve, making an intelligent technology choice isn’t possible.
The Data Mountain
Things get even worse—much worse in fact—when you begin to consider data. Today’s large enterprises spin off an amazing amount of data in the form of terabytes and petabytes (sheer amount), but they’re also complex in terms of breadth (number of tables and columns).
If you haven’t narrowed down your problem set by deciding on the business problem you’re trying to solve, you simply have no way to decide what data belongs in your repository. If you answer “all data,” not only are you committed to a massive building project, but you’ve also likely committed to a many-years-long master data management project to understand everything in the repository and document it sufficiently for use. Good luck with that!
The need for a problem-first approach to analytics doesn’t end with choosing a technology, and selecting and describing subsets of the data. It flows right into the analysis itself.
As with many relatively arcane and mathematical disciplines, outsiders tend to draw a big box around analytics and describe it as “magic happens here.” Therefore, many people are reluctant to tell analysts what problems to solve, assuming that the data should guide the decision. This attitude represents a deep confusion.
When studying a specific problem, we do indeed want the analyst to let the data guide the decision. But data must be studied and analyzed in the context of a specific problem. No explorer in history ever departed randomly without a direction in mind. Why? Because it’s preposterous to believe that moving around randomly will result in a discovery. This is as true for data as it is for geography.