Native XML Databases Primer

  • What are they?

    Databases that store and manage eXtensible Markup Language (XML) documents. The industry hasn’t yet agreed on the specifics, however. Some store a document in its entirety; others separate the document’s contents into different tables; and still others use a combination of methods. The point is that these databases and their management tools are built expressly for handling XML’s peculiarities.

  • Why do I need to store XML documents?

    More and more of the data you use internally or share with customers or partners is being created in XML, from Microsoft Word files to Internet transactions. A database that handles XML natively can hook directly into the applications—such as Web services—that need them, without being slowed down by an extra translation layer.

  • My relational database can store XML documents already. Why would I need another one?

    A relational database either stores an XML document as a chunk of unsearchable data (known as a BLOB, or Binary Large Object), or chops it up into separate tables. A BLOB is useless unless you add sufficient metadata describing it—a processor- and space-wasting proposition. To chop up documents into separate tables and then put them back together again is extremely processor intensive; it also requires a potentially huge number of linked tables to accommodate the XML structure or structures you are receiving. And finally, when a document is reassembled, it’s often in a different form than the one in which it originally arrived, introducing errors and ballooning the size. Storing and indexing XML documents in their native form eliminates these problems and increases the performance of the applications that depend on the data.

  • How do they work?

    XML databases require all the elements that you would expect from a database management system: a query and transaction language; a way to build indices and data structures; and interfaces for other applications. In addition, XML databases also need a method of traversing the data within an XML document in a way that coheres to the document’s structure—that is, more than a string-matching function.

  • Who’s using them?

    The first commercial product, Tamino, was developed by Software AG; the state of California uses it to handle the electronic filing of sales and use taxes. MCI WorldCom uses the same product for managing report generation. Aventis Pharmaceuticals uses a product from Ipedo to catalog, search and analyze annotated research data, and deliver customized versions of the information to partners and researchers. And Hewlett-Packard chose Coherity’s XML database products for a large-scale Web integration project. The Apache Software Foundation developed an open-source database called Xindice. Other companies—including Oracle and Microsoft—are now developing their own native XML databases.

  • What’s the catch?

    The standards. Not only are the methods for interacting with and managing XML databases still being worked out by various industry groups, the exact definition of what an XML database is is also an issue. And there are those who believe that modeling a database on a hierarchical language is a big step backward: XML’s parent-child structure is unwieldy and inefficient, they argue, and doesn’t provide anything that couldn’t be done better in a relational database.

    Background reading
    The XML:DB industry group offers some coherence for this nascent market.