Big Data Governance: 5 Lessons Learned From PRISM
By Jason Bloomberg
In the days since the news of the National Security Agency’s secret PRISM spying—oops, surveillance, initiative broke, there has been no end of consternation among the media and the Twitterverse. Regardless of what you think of the NSA’s efforts to collect information on Americans, one fact is clear: Big data is real, it’s here to stay and it’s dangerous.
As I explain in The Agile Architecture Revolution, the more powerful a technology is, the more importance we must place on governance. And that applies to big data.
If most people agree that finding terrorists and stopping them before they can wreak havoc is a good thing, why are they livid now? The answer is that we’re not angry that the NSA for gathering intelligence on terrorists; we're angry that it is gathering intelligence on everybody else. And that brings us to big data and some lessons that can help us govern our data more effectively.
Lesson 1: Govern even the data you don’t want.
This is PRISM’s first big data lesson: It’s not just the data you want that’s important; you also have to worry about the data you don’t want. Traditional data governance generally focuses on the data you want: We make sure our data is clean, correct and properly secured. When we have a limited quantity of data and it all has value, issues such as data quality are relatively straightforward (although achieving data quality in practice may still be a major headache).
In the big data scenario, however, we’re miners looking for that nugget of gold hidden in vast quantities of dross. Yes, we must govern that nugget of value, but that’s the easy task, relatively speaking. The first lesson from PRISM is that we must also govern the dross: the data we don’t want, because it opens up a range of governance challenges, such as the privacy issues at the core of the PRISM scandal.
Your big data governance challenge may not be related to privacy, but the fact remains that the more leftover data you have, the harder it is to govern it. After all, just because you don’t find value in that data doesn’t mean that your competition or a hacker won’t.
Lesson 2: Think of metadata as big data, too.
The second lesson: Metadata also may be big data. Data professionals are used to thinking of metadata as having technical value but little worth outside the IT organization. In the case of PRISM, however, the NSA went after call-detail records (metadata), not the calls themselves. This focus on call metadata serves to highlight the fact that the metadata itself may be the most valuable big data you own. Ask yourself: How robust is our metadata governance? If it’s not as rock solid as your everyday data governance, then perhaps you’re not ready for big data.
Lesson 3: You need sophisticated analytics apps to cut through the clutter.
PRISM lesson number 3: Big data analytics apps can be data governance tools, particularly when the central challenge is data quality. Terrorists, after all, aren’t going to send tweets like “buying #plasticexplosives now, meet me at the #Boston #Marathon.” We can safely assume that terrorists are actively seeking to obscure their communications, which from the enterprise perspective, is an example of (in this case intentionally) poor data quality.
The NSA naturally has sophisticated algorithms for cutting through such obfuscation. As your big data sets grow, you’ll need similarly sophisticated tools for cleaning up run-of-the-mill data quality issues. Remember, the bigger the data sets, the more diverse and messy your data quality challenges will become. After all, fixing mailing address formats in your ERP system is dramatically simpler than whipping a vast hodgepodge of structured, semistructured and unstructured information into some kind of order.
Lesson 4: Figure out what to do with all your data.
The fourth lesson involves what we at ZapThink like to call the big data corollary to Parkinson’s Law, which states that the amount of work you have will expand to fill the available time. The big data corollary says that the amount of data you collect will expand to consume your ability to store and process it. In other words, if it’s possible to collect big data, somebody will. The question is not whether to collect big data—it's what to do with it.
Lesson 5: Find value in historical data, not just current data.
Finally, the fifth lesson is actually a lesson from something the NSA is not doing, because for the agency, current data is more valuable than historical data. The NSA’s paramount concern is to mine current intelligence: what terrorists are doing right now.
However, your company might find value in using historical data along with current data to solve problems. If some of your business issues do deal with historical trends, then your data sets have ballooned again, as have your data governance challenges.
The NSA was collectingonly phone call metadata because that metadata met its needs. But what about the data itself—the audio portion of the calls? Perhaps the NSA is currently unable to collect such vast quantities of data. But if that's the case, it’s only a matter of time. The question is, once the NSA is able to collect all call audio, will it? I believe it will. After all, that’s the corollary to Parkinson’s Law in action.
In fact, we might as well assume that somewhere in the federal government, agents are collecting all the data—all the phone calls, emails, tweets, text messages, blog posts, forum comments, log files, everything. Because even if they aren’t able to amass the whole shebang yet, it’s just a matter of time till they can.
While this scenario may seem like a page out of Orwell’s 1984, the most important lesson here is that data governance is now critically important. It’s no longer a question of whether we can collect big data. The question is: What should we do with big data once we have it?
Jason Bloomberg is president of ZapThink, a Dovel Technologies Co. His fourth book, The Agile Architecture Revolution: How Cloud Computing, REST-based SOA, and Mobile Computing Are Changing Enterprise IT (John Wiley & Sons), was published in March 2013.