Data-Sharing: Not Just FBI’s Problem

Outrage and disbelief were the first reactions last month to news that FBI agents, in the weeks before Sept. 11, had been unable to search bureau computers for a word string as simple as “flight schools.” That revelation by Coleen Rowley, the whistle-blowing special agent who testified before Congress, shocked Google users everywhere and (as one newspaper account put it), left politicians “agog.”

Among those who manage data at large enterprises, however, the reaction was more muted. Which is exactly what you’d expect.

After all, it’s not exactly a new predicament: what to do when you have dozens of different databases, in different formats and built with different tools, on incompatible hardware in geographically separate locations. This is the way of things not just at the FBI but at any big organization that has ever been involved in a merger, switched technology vendors, expanded a data center, or made its services available over the Web. “Dealing with all the legacy systems is a very big challenge,” says John Piscitello, a product manager at Google, which recently introduced just the sort of corporate-search tool that might help the FBI search its archives—that is, if the FBI were inclined to convert those archives into HTML documents.

Since the 1990s, some companies have addressed the problem of stovepiped data in a different way —by building data warehouses, which are centrally stored repositories in which information from numerous sources gets pounded into a consistent format and can thus be queried. (There are also virtual data warehouses—a less-involved and often less-successful approach—in which technologists leave the data where it is and run distributed queries across the different databases.) Data warehouses have increased in appeal now that the mining of customer data is seen as yielding competitive advantage, and they are an outright requirement for many customer-relationship management applications.

The good news, both for the FBI and for the countless big companies facing similar technical challenges, is that data warehousing isn’t rocket science. The bad news is there are no shortcuts. “The problem is you have to populate the new system with information that’s been around for years,” says Rino Bergonzi, a former chief information officer at AT&T and UPS who now works as a consultant. “The solution isn’t complex, but physically doing it is very costly.”

Nobody I spoke with would hazard a guess as to what, say, the FBI would need to spend to compress its millions of paper and electronic records into a state-of-the-art data warehouse. Wayne Eckerson of the Data Warehousing Institute noted that the organic nature of a data warehouse—its need to be continually maintained—makes the costs impossible to know in advance. But he said big projects invariably run to the tens of millions of dollars.

“The cost to pull together or re-integrate a large organization or federal agency ain’t cheap,” said Eckerson, the institute’s director of education and research. “The tools aren’t cheap; the databases aren’t cheap. People build these things incrementally these days.” (A very clear depiction of the steps and resources required to build a data warehouse appeared in the June issue of Baseline, pp. 37-38, and can be found online at www.baselinemag.com/021_planner.)

While cost may be a big hurdle for companies’ data-aggregation projects, it isn’t the only one. Turf wars, and in particular, the belief on the part of some managers that their power lies in the unique knowledge they possess, can obstruct efforts to share data in organizations, as the FBI example makes painfully clear.

Of course, the consequences of hoarding data are much greater in a federal agency fighting the war on terrorism than they could ever be in the private sector. Only money is at stake in most businesses, not lives. Still, rethinking our knee-jerk resistance to sharing data openly with those who are basically on our team seems like an obvious adjustment we should all make post-9/11.