Expensive Errors

By Deborah Gage Print this article Print

ChoicePoint's security and accuracy snafus prompt lawsuits.

Expensive errors

The total cost of all this bad data? The Data Warehouse Institute, a business intelligence and data analysis industry consortium, estimates that data errors are costing U.S. businesses about $600 billion a year. Companies without proper information management and controls are already spending 10% or more of their operating revenue on fixing problems that stem from bad data, according to Larry English, president of Information Impact, a consulting company that specializes in data quality.

ChoicePoint has data on more than 220 million U.S. citizens—about four of five Americans. That information has been used to help families find missing children, law enforcement officials track down criminals, and insurance companies offer quick policy approvals.

But all companies find it problematic to keep data secure and accurate, says Randy Bean, managing partner of NewVantage Partners, an information-technology consultancy that works with large companies such as Fidelity and Liberty Mutual Insurance. And ChoicePoint assumes that the facts it acquires on people are accurate when they arrive. "It is impractical to verify the accuracy of a record created by someone other than ourselves prior to the time the information is used in a report," wrote James Lee, the company's chief marketing officer, in an e-mail to Baseline.

Lee also argues that ChoicePoint is subject to the Fair Credit Reporting Act, which allows people to review and dispute information in their credit reports, and other consumer protection laws that govern the company's pre-employment, insurance-history and tenant-screening reports. For public records—such as those kept by government agencies—not covered by these laws, ChoicePoint allows people a free search of public records once a year.

choicepoint strenuously guards details of its information gathering and processing procedures. Baseline's understanding of how the company works is based on court filings, company statements, and interviews with industry experts and company executives.

ChoicePoint feeds its huge databases with streams of data tapes and compact discs from insurance firms, marketing companies and other commercial sources, as well as public records such as court documents and licenses. But ChoicePoint verifies little of the data in its vast repositories.

Most of the company's systems are "fire and forget," says an industry expert with 20 years of experience in the kind of data brokering and data processing done by ChoicePoint. This means ChoicePoint simply loads data from outside sources into its system and moves on to the next task.

ChoicePoint does check these files for anomalies. For example, if an insurance company that regularly sends updated files with a small percentage of changes—say, less than half a percent—suddenly sends a tape with more than 1% of the data changed, ChoicePoint will spot the jump and ask why.

But the company does not check any of the information that's supplied on tape by outside sources. "We do have a robust QA process to look for duplicate records and other anomalies, but at the end of the day, we have to rely on the organization that provided the information," Lee wrote in his e-mail message. "They created it and only they would know if it is accurate. Once they say it is, we have no basis or manner to challenge the assertion."

In addition to receiving data, ChoicePoint gathers data that it needs to build reports on individuals. The company says it has electronic gateways into some databases, such as some state motor vehicle department files. ChoicePoint also says it employs an army of researchers to verify information on, say, an employment form, and checks their work after it is completed.

And ChoicePoint says it puts the data it collects through a process known as data cleansing. The company uses Firstlogic's Information Quality Suite software, which can drill through files and look for inconsistencies and, in some cases, can fix problems automatically. The software, for instance, will check names and addresses against a U.S. Postal Service file to see if multiple files can be consolidated.

But no data-cleansing product will catch every inconsistency, according to English. In addition, much of what the researchers gather comes from public records, many of which contain errors.

And the way ChoicePoint's computer systems handle data may cause further inaccuracies.

For example, data suppliers in many cases send in new tapes and CDs with updates for ChoicePoint's CLUE database, which stores 200 million insurance records. The data is loaded when it comes in, and the old data is purged. If an error in a file isn't corrected at the source, the erroneous data will be reloaded into ChoicePoint's systems. As the data expert points out: "They simply replace data each month."

Tips on cleaning up your data

Here are some precautions that companies can take to make sure their data starts—and stays—accurate.

PROBLEM: You have sloppy data entry.
RESOLUTION: Double-check everything.
You can purchase or create code that will prompt data entry clerks to double-check an address or the spelling of a name, based on a
comparison with a reference data source. But such tools can also introduce errors, because workers
get in the habit of blindly accepting the software's suggestions.
Tom Redman, president of Navesink Consulting Group and co-founder of the International Association for Information and Data Quality, recommends that companies create a position called chief data officer. This person would be responsible for measuring data accuracy, bringing in tools to keep it clean and, most important, putting in place processes to ensure that data is correct when entered at its source—by redesigning data input tasks and changing worker habits.

PROBLEM: You don't know whether to trust your data.
RESOLUTION: Test its accuracy.
Develop a benchmark for measuring the accuracy of your
data, such as a percentage of error-free records, and monitor it rigorously.
For an organization like ChoicePoint, this might mean thoroughly spot-checking records from a randomly chosen subset of consumer profiles and recording what percentage of records contain erroneous address, employment, insurance-claim or criminal-history data. Addresses can be checked fairly reliably against postal databases.

PROBLEM: You find it difficult to combine data on one person.
RESOLUTION: Map out where the right data is. Find all the places in your enterprise that contain different records referring to the same individual. Establish guidelines on where to get which type of information. Then establish procedures to ensure that a piece of data, such as an address, is consistently recorded every time it appears in a database. One way to enforce consistency is by creating a master data file, where a database pulls information from various data stores and creates a master record for each individual.
—David F. Carr

This article was originally published on 2005-12-13
Senior Writer
Based in Silicon Valley, Debbie was a founding member of Ziff Davis Media's Sm@rt Partner, where she developed investigative projects and wrote a column on start-ups. She has covered the high-tech industry since 1994 and has also worked for Minnesota Public Radio, covering state politics. She has written freelance op-ed pieces on public education for the San Jose Mercury News, and has also won several national awards for her work co-producing a documentary. She has a B.A. from Minnesota State University.

eWeek eWeek

Have the latest technology news and resources emailed to you everyday.