Native XML Databases PrimerBy Sean Gallagher | Posted 2002-06-17 Email Print
The industry has yet to agree on some of the specifics of XML databases, but this primer will give you the what, why and how of these XML data containers.
Databases that store and manage eXtensible Markup Language (XML) documents. The industry hasn't yet agreed on the specifics, however. Some store a document in its entirety; others separate the document's contents into different tables; and still others use a combination of methods. The point is that these databases and their management tools are built expressly for handling XML's peculiarities.
More and more of the data you use internally or share with customers or partners is being created in XML, from Microsoft Word files to Internet transactions. A database that handles XML natively can hook directly into the applicationssuch as Web servicesthat need them, without being slowed down by an extra translation layer.
A relational database either stores an XML document as a chunk of unsearchable data (known as a BLOB, or Binary Large Object), or chops it up into separate tables. A BLOB is useless unless you add sufficient metadata describing ita processor- and space-wasting proposition. To chop up documents into separate tables and then put them back together again is extremely processor intensive; it also requires a potentially huge number of linked tables to accommodate the XML structure or structures you are receiving. And finally, when a document is reassembled, it's often in a different form than the one in which it originally arrived, introducing errors and ballooning the size. Storing and indexing XML documents in their native form eliminates these problems and increases the performance of the applications that depend on the data.
XML databases require all the elements that you would expect from a database management system: a query and transaction language; a way to build indices and data structures; and interfaces for other applications. In addition, XML databases also need a method of traversing the data within an XML document in a way that coheres to the document's structurethat is, more than a string-matching function.
The first commercial product, Tamino, was developed by Software AG; the state of California uses it to handle the electronic filing of sales and use taxes. MCI WorldCom uses the same product for managing report generation. Aventis Pharmaceuticals uses a product from Ipedo to catalog, search and analyze annotated research data, and deliver customized versions of the information to partners and researchers. And Hewlett-Packard chose Coherity's XML database products for a large-scale Web integration project. The Apache Software Foundation developed an open-source database called Xindice. Other companiesincluding Oracle and Microsoftare now developing their own native XML databases.
The standards. Not only are the methods for interacting with and managing XML databases still being worked out by various industry groups, the exact definition of what an XML database is is also an issue. And there are those who believe that modeling a database on a hierarchical language is a big step backward: XML's parent-child structure is unwieldy and inefficient, they argue, and doesn't provide anything that couldn't be done better in a relational database.
The XML:DB industry group offers some coherence for this nascent market.