Primer: The Virtual Database

PDF Download

  • What is it? A way to manage many different data sources as though they were all in one place. Sometimes called “virtual centralized database” or “federated database.”

  • Who came up with it? Depends on whom you talk to. The desire to manage all of a company’s data with one application is not new, and a few different answers exist today under the catchphrase Enterprise Information Integration (EII). Data warehousing, which began in the early ’90s, is often grouped within EII. One of the first commercial products behind the “virtual database” concept is DataJoiner, middleware that IBM introduced in 1995. New York City-based MetaMatrix says it began work on its EII technology in 1997.

  • What are the benefits? Lower development costs, faster development and less maintenance work are a few of the immediate benefits. Having a single entry point into a company’s data means each application doesn’t need its own private connection to each data source. Programmers, in addition, can build queries more quickly if there is only one database to write against. It also minimizes the chances that a new or redundant database will be created, thus controlling maintenance costs.

  • How does it work? Like a database, only instead of being comprised of software that knows how to manage data plus a physical place to keep that data, a virtual database has only software. The data it manages can physically reside anywhere on the network, in a variety of formats; to the user, however, it all looks like data in one database. One also might make the analogy that a virtual database is to a set of databases as a view is to a set of physical tables.

    The details vary by vendor, but every virtual database has a generic data model. This model has to be precise enough to be useful (“Color” instead of “Feature1”), but not so precise that it starts to be exclusive (“SizeInMillimeters” instead of “Size”). A virtual database also has a map that shows how it relates to its data sources. Although it doesn’t need to have a place to store the data it is managing, a virtual database does need a place to store its model, mappings and other information.

  • Why not use enterprise application integration (EAI)? Because that can carry unnecessary overhead. As its name implies, EAI connects applications. Each application has its own database; an integration server takes the data that’s output by one application and modifies it so that it can be read and used by another application. By contrast, a virtual database connects an application directly to the sources of data. Doing so eliminates the need for unique adapters—because the data can be accessed by standard database protocols—and also saves processing time and energy by avoiding the application that sits in front of a database.

  • What are the drawbacks? The biggest problem is the same one that faces every integration effort: building a universal data model. A single schema that satisfies all or even half of a company’s data needs is awfully hard to come by, especially with the general uncleanliness of existing data.

  • Who’s using it? Most companies are just beginning to consider EII and virtual databases seriously. It’s a pretty difficult idea to sell inside a company that already has spent a lot on EAI, though. The organizations that are furthest along are the ones that have to meet strict regulations, such as the Health Insurance Portability and Accountability Act, or, like the Department of Defense, need to perform complex data analysis without creating a warehouse of data.