How PRISM Validates Big DataPosted 2013-06-27 Email Print
Re-Thinking HR: What Every CIO Needs to Know About Tomorrow's Workforce REGISTER >
Lost behind the concerns about the U.S. government’s intelligence-gathering program is a validation of the concepts and technologies that underpin big data.
By Dan Rosanova
No matter how you feel about its implications, the government intelligence-gathering program now known as PRISM represents a tremendous revelation on many levels for both our friends and our adversaries. Lost behind the headlines and the privacy and geopolitical concerns about the U.S. government’s massive intelligence-gathering program is a validation of the core concepts and technologies that underpin the big data movement.
Widespread acceptance and use of advanced technologies in the government intelligence arena is nothing new, and the actions of many of these same agencies also foster advanced technology developments in the U.S. private sector. (See the Financial Times’ Silicon Valley rooted in backing from U.S. military.) Some of the most venerable names in the technology world have had long, deep connections with the military industrial complex of the United States.
In the 1970s, a CIA program for gathering and processing intelligence grew beyond the confines of the agency and changed the world of technology. This program led to the relational database—and the product and company that’s synonymous with databases—Oracle. This concept was radically different from existing approaches and represented a tremendous step forward in data management, processing and representation. And its very name implied something that could divine the future.
The relational database fundamentally changed the IT landscape of the last 30 years. By default, the database has become the hub where all of an organization’s information resides—both in practice and in perception. The concept of the relational database has permeated our society and led to some of the data-growth challenges most organizations face.
A relational database is very good at storing associated items that are highly structured, but due to its ubiquity, it has become the place where everything is stored—regardless of its structural or relational properties. These databases have become the core medium of shared storage, even for seemingly ill-fitting uses such as historical price feeds. However, limitations to this relational database-heavy approach are rapidly becoming apparent to those relying on it.
For one, cost is a major issue. Software, hardware, network, personnel—they’re all there and all expensive. Another problem: This model doesn’t scale infinitely. This is a pain that's acutely felt by a relatively few organizations thus far, but it will soon be felt by many more. This big data problem is growing fast: IBM estimates that 90 percent of the world’s information has been created in the last two years.
It is telling that some of the same agencies that spearheaded the creation of previous waves of technology, most notably the relational database, are at the heart of PRISM. If we thought before the PRISM story made the headlines that big data was solely the domain of social networking companies and Internet giants, we certainly have a different understanding now.
However large the big data challenges faced by technology bellwethers are, PRISM is addressing ones that are far greater. We’re increasingly seeing endless real-world uses for big data applications. In the case of PRISM, these uses are both concrete and serious.
Put aside all the noise about PRISM, and it becomes clear that it is a validation of the applications at the platform’s core: namely, cloud and big data. This program uses technologies that the rest of us would recognize as cloud (massively distributed, hardware abstracted, commodity component-based) and big data (Hadoop, machine learning and pattern recognition) on a massive scale.
That’s not to say that PRISM is using Apache Hadoop, but rather something similar: a massively distributed file system that can hold large volumes of unstructured data and process it in a fast and parallel way. This platform must be self-healing, horizontally scalable and built with off-the-shelf components. Like Hadoop, it most likely works by sending the program to the data rather than the more traditional approach of ingesting the data into the program.
This tells us that big data is beyond the research or early adopter stage. Clearly, there is value in both the information and platform being used, or the agencies involved would have changed course over time. Legal and privacy issues aside, this is a significant confirmation of the big data concepts and technologies that have been dominating the press recently.
Perhaps the biggest change from previous intelligence or defense-driven technology cycles is the speed at which these same technologies now find their way into civilian hands. Today, a variety of Hadoop-enabled platforms are available, and the cloud is everywhere. Obviously, the private sector’s goals and objectives are quite different from what PRISM is being used for, but we can be fairly confident that the technology works and is cost-effective.
Many had considered the recently exposed program to be technically unfeasible—until now. “Big” doesn’t even begin to describe the scope of the data challenge PRISM is addressing; it’s reassuring to realize that anything we throw at this technology can be handled.
Dan Rosanova is a senior architect in West Monroe Partners’ technology integration practice.