Mopping Up Dirty Data

Direct marketers have long reckoned with cleaning dirty databases loaded with duplicate names, misspelled addresses and other shortcomings. Today, companies in more diverse industries—from technology to manufacturing—are jumping on the information-quality bandwagon to get the most out of managing customer relations and other projects relying on lots of data.

RSA Security, like countless other companies, found out the hard way about the perils of dirty data. The security-software firm had just installed customer-relationship-management software from Siebel Systems in 2001 to make it easier for its 300-person sales force to sell authentication products to enterprise and electronic-commerce customers.

Problem was, the Siebel application was fed by several incompatible systems, each with its own way of identifying customers. And RSA had no way—short of manually going through every record—of being sure salespeople were being given complete and correct information on all customers. If, for example, the company’s order-entry system reported information on a customer named James Smith, and the Web server had information on a Jim Smith, the Siebel system probably wouldn’t catch on that the two Mr. Smiths were really one customer.

“We had no way of filtering all the nonsense data out, so we ended up with a lot of duplicate and just-plain-wrong data in the application,” says John Ma, manager of information-system applications at RSA. “It was quite confusing to the sales folks.”

So Ma’s team tried data-quality software to scan the Siebel database for duplicates and errors and automatically correct them. RSA selected the Data Quality Connector for Siebel, a program from Group 1 Software specifically designed to work with Siebel customer-relationship management software. The Group 1 software identified a whopping 40,000 of the 160,000 customer records in the Siebel database as duplicates or errors and eliminated them. Once the data was cleaned up, says Ma, 95% of RSA’s salespeople started using Siebel applications.

That’s helped the company do a better job of turning prospects into customers, Ma says. And it’s showing. In the third quarter, 700 of RSA’s 4,000 customers were new. The project may have benefited the company’s bottom line, too. For the three months ended Sept. 30, the company reported earnings of $3.65 million compared to a loss of $8.22 million for the same period in 2002.

In addition to Group 1, companies such as Harte-Hanks’ Trillium, Ascential and Firstlogic offer software that uses complicated matching algorithms to comb through databases and spot problems in records based on a set of user-defined criteria.

The strength, particularly of products like Group 1’s DataSight suite and Firstlogic’s IQ Suite, is correcting name and address records. But software packages can also fix other, unrelated problems, such as making sure that customer e-mail addresses are right or that customer-contact histories are complete. Most data-quality software can also append records with missing or relevant information such as four-digit ZIP code extensions or data for geographic location. And most packages can be set up to operate in real time—catching errors as, say, customers enter contact information into Web-based applications—as well as in batch mode.

The Data Warehousing Institute, an industry trade group, last year estimated that poor data cost U.S. businesses $600 billion annually in wasted postage and marketing costs as well as lost customer credibility. In contrast, the market for data-quality software is estimated to reach $600 million this year, according to the Giga Information Group, a division of Forrester Research.

Analysts expect spending to increase. Companies, they say, want to maximize their investments in software that handles customer interactions and data warehousing.

On top of that, federal laws are forcing organizations to clean up. The Data Quality Act, which took effect in 2002, requires, among other things, that federal agencies disseminate correct information. Other measures, such as the Sarbanes-Oxley Act established by Congress to address corporate fraud, put the burden of maintaining and protecting accurate data on companies and public agencies. Meta Group estimates that, over the next five years, the number of companies deploying data-quality software will grow by 20% to 30% each year.

Newcomers to data-quality technology admit that software packages are expensive—with entry prices ranging from $75,000 to $200,000—and can be complicated to use and deploy. Nonetheless, the software delivers the expected payoff.

Online travel site Travelocity, for example, has saved “a huge amount of money—many, many thousands of dollars,” says software developer Carl Nicol. Travelocity implemented the real-time version of Firstlogic’s IQ Suite to determine whether customers requesting physical tickets have entered valid addresses to which overnight deliveries can be made by FedEx. “Before, the only way we knew an address was bad was when FedEx bounced the package back. Then we had to resend it, paying FedEx twice.”

Executives at Diversified Business Communications using Group 1’s data-quality software were able to cut outsourcing expenses. Until recently, says data services manager Pauline McNeil, Diversified used an outsourcer to organize and cleanse customer data—derived from subscription lists and trade-show sign-ups—before sending out direct-marketing mailings. “But every time we wanted to tweak the list, we had to pay the outsourcer up to $3,000 to re-run it and clean it,” she says. “Considering that we do 15 or so mailings a year, that adds up.” By cutting outsourcing costs, McNeil says, the Group 1 software will pay for itself in a couple of years.

But cost-cutting isn’t the only benefit. Avoiding duplicate direct-marketing mailings not only saves money, it also improves customer confidence and satisfaction. That’s important to Save the Children, a non-profit relief organization using Ascential QualityStage software to clean up its million-record donor database. Sergio Bouscoulet, manager of operations, says the software has already spotted and corrected errors or duplications in 15% to 20% of the organization’s donor records.

“Before, those would have resulted in duplicate mailings that carried non-tangible costs,” says Bouscoulet. “Donors might get the perception that we weren’t making the best use of their money if they received two letters saying exactly the same thing. We certainly couldn’t afford that.”


Group Dynamics: Data Quality

Category: Data-quality software

What It Is: Products for profiling, cleaning, enhancing and consolidating databases.

Key Players: Ascential Software, DataFlux, Firstlogic, Group 1 Software, Innovative Systems, Harte-Hanks’ Trillium

Others: DataMentors, Acxiom, Evoke Software, Similarity Systems

Market Size: $580 million (2002, est.)*

What’s Happening: The market is expected to grow because companies want to get the most from their customer data and other information.

Expertise Online: Database Knowledge Base http://database.ittoolbox.com/ Includes online forums.

*Source: Giga information group, a division of Forrester.