Collecting Data Without Garbage Filters

By John McCormick  |  Posted 2005-06-14 Print this article Print

Steven Calderon was into his second week working as a security guard for Fry's Electronics when Anaheim, Calif., police walked in and arrested him. Fry's had requested a background check on Calderon, which was done by The Screening Network, a service of C

Collecting Data Without Garbage Filters

ChoicePoint strenuously guards details of its information gathering and processing procedures. Baseline's understanding of how the company works is based on court filings, company statements, and interviews with industry experts and company executives.

ChoicePoint feeds its huge databases with streams of data tapes and compact discs from insurance firms, marketing companies and other commercial sources, as well as public records such as court documents and licenses. Insurance companies, for example, then use ChoicePoint's data stores to check the claims history of policy applicants.

But little of the data in its vast repositories is verified by ChoicePoint, which mostly aggregates files and generates reports from its files without checking the information in those files for duplication, omissions or inaccuracies. "Very little, if anything" is checked, according to an industry expert with 20 years of experience in the kind of data brokering and data processing business done by ChoicePoint who asked not to be named.

Files, this industry expert says, can contain misspelled names, out-of-date addresses, faulty insurance claims or any number of other inaccuracies.

Most of their systems are "fire and forget," he says, meaning that ChoicePoint simply loads data from outside sources into its system and moves on to the next task.

ChoicePoint will check these files for anomalies. For example, if an insurance company that regularly sends ChoicePoint updated files with a small percentage of changes—say, less than half a percent—suddenly sends ChoicePoint a tape with more than 1% of the data changed, ChoicePoint will spot the jump. The company will then ask why the new file doesn't match historical patterns.

But the company does not check any of the information on these tapes and CDs.

"Almost all of the information we hold is information we acquired from a source," Curling says. For instance, he explains, "We buy [data] directly from the state. Is that accurate or not? Well, in our case, accuracy there would be dominated by the fact that we got it directly from the source. And we applied that in an electronic form to a file."

In effect, the company says it's responsible for making sure the data gets loaded into its systems correctly. And it counts on the data suppliers, such as the insurers or the states, to provide accurate data.

"ChoicePoint has exhibited the attitude, 'Oh, we just need to pass on this information. We're not responsible for whether it's accurate,'" says Evan Hendricks, editor and publisher of the Washington-based newsletter Privacy Times.

In addition to receiving data, ChoicePoint also goes out and gets data that it needs to build reports on individuals.

The company says it has electronic gateways into some databases, such as some state motor vehicle department files. With these gateways, the company can electronically collect driver names, license numbers and car registrations. ChoicePoint often pays agencies for their data—about $500 million a year to state motor vehicle departments alone.

Don McGuffey, the company's senior vice president for data acquisition and strategy, told the California State Senate Banking Committee on March 30: "We get updates from the various states' agencies regularly and rely upon the state agency to give us all the information and the complete information."

ChoicePoint also says it employs an army of researchers to verify information on, say, an employment form. They travel around the country, pen or PDA in hand, and stop by courthouses to write down data from bankruptcies, judgments, licensing sanctions and other proceedings. According to published statements by CEO Derek Smith, ChoicePoint collects up to 40,000 records manually a day through this network of employees and contractors.

The workers fax or e-mail the documents to ChoicePoint. Later, ChoicePoint checks their work.

"We will audit them periodically throughout the year, not only by randomly collecting individual records that they would have developed and returned to us; we would also require them to physically send in copies of documents periodically so a separate team of people could review the actual hard copies of documents in the courthouse versus what was actually entered and delivered back to us," McGuffey says.

And ChoicePoint does say it puts the data it collects through what's known as a data cleansing process. The company uses Firstlogic's Information Quality Suite software, which can drill through files looking for inconsistencies and, in some cases, can fix problems automatically. The software, for instance, will check a name and address against a U.S. Postal Service file. The software can also determine that data on "Smith, Sam" in one file also applies to "Samuel Smith" in another if there is other matching information, such as an address, so that the two files can be consolidated.

But no product will catch every inconsistency. Or typos. Or other input errors, according to English.

In addition, much of what the researchers gather comes from court documents and other public records and can contain wrong names and addresses, duplications and omissions.

Story Guide:

Blur: The importance of Accuracy

  • Not Just Security — Accuracy.
  • "Serious" Errors are Common
  • Data Customers Pay the Costs
  • Collecting Data Without Garbage Filters
  • Records "Full of Inccuracies"
  • Crap In, Crap Out
  • Fix It Yourself
  • No Way To Check
  • ChoicePoint Data at a Glance


Submit a Comment

Loading Comments...
eWeek eWeek

Have the latest technology news and resources emailed to you everyday.