By Jason Bloomberg
In the days since the news of the National Security Agency’s secret PRISM spying—oops, surveillance, initiative broke, there has been no end of consternation among the media and the Twitterverse. Regardless of what you think of the NSA’s efforts to collect information on Americans, one fact is clear: Big data is real, it’s here to stay and it’s dangerous.
As I explain in The Agile Architecture Revolution, the more powerful a technology is, the more importance we must place on governance. And that applies to big data.
If most people agree that finding terrorists and stopping them before they can wreak havoc is a good thing, why are they livid now? The answer is that we’re not angry that the NSA for gathering intelligence on terrorists; we’re angry that it is gathering intelligence on everybody else. And that brings us to big data and some lessons that can help us govern our data more effectively.
Lesson 1: Govern even the data you don’t want.
This is PRISM’s first big data lesson: It’s not just the data you want that’s important; you also have to worry about the data you don’t want. Traditional data governance generally focuses on the data you want: We make sure our data is clean, correct and properly secured. When we have a limited quantity of data and it all has value, issues such as data quality are relatively straightforward (although achieving data quality in practice may still be a major headache).
In the big data scenario, however, we’re miners looking for that nugget of gold hidden in vast quantities of dross. Yes, we must govern that nugget of value, but that’s the easy task, relatively speaking. The first lesson from PRISM is that we must also govern the dross: the data we don’t want, because it opens up a range of governance challenges, such as the privacy issues at the core of the PRISM scandal.
Your big data governance challenge may not be related to privacy, but the fact remains that the more leftover data you have, the harder it is to govern it. After all, just because you don’t find value in that data doesn’t mean that your competition or a hacker won’t.
Lesson 2: Think of metadata as big data, too.
The second lesson: Metadata also may be big data. Data professionals are used to thinking of metadata as having technical value but little worth outside the IT organization. In the case of PRISM, however, the NSA went after call-detail records (metadata), not the calls themselves. This focus on call metadata serves to highlight the fact that the metadata itself may be the most valuable big data you own. Ask yourself: How robust is our metadata governance? If it’s not as rock solid as your everyday data governance, then perhaps you’re not ready for big data.
Lesson 3: You need sophisticated analytics apps to cut through the clutter.
PRISM lesson number 3: Big data analytics apps can be data governance tools, particularly when the central challenge is data quality. Terrorists, after all, aren’t going to send tweets like “buying #plasticexplosives now, meet me at the #Boston #Marathon.” We can safely assume that terrorists are actively seeking to obscure their communications, which from the enterprise perspective, is an example of (in this case intentionally) poor data quality.
The NSA naturally has sophisticated algorithms for cutting through such obfuscation. As your big data sets grow, you’ll need similarly sophisticated tools for cleaning up run-of-the-mill data quality issues. Remember, the bigger the data sets, the more diverse and messy your data quality challenges will become. After all, fixing mailing address formats in your ERP system is dramatically simpler than whipping a vast hodgepodge of structured, semistructured and unstructured information into some kind of order.