What Went Wrong

By David F. Carr  |  Posted 2002-05-15 Print this article Print

Payday is a great day, usually. But for thousands of Bank of America customers, it was the worst, as their electronic deposits vanished.


What Went Wrong?

Bank of America apparently was not that lucky. At some point in the process of transferring downloaded transactions from the ACH network to databases at its Nevada data center, at least part of the batch processing did not get completed. "More often than not, data center disasters tend to be caused by software or human error," says Steinhardt.

Bank of America handled nearly 350 million ACH transactions in 2001, making it the fifth-largest originator of such transactions, according to NACHA, the national electronic payments association. Overall, the volume of ACH payments approached 8 billion in 2001, up 16% from 2000. That includes 3.7 billion direct deposits of payroll and reimbursement checks, investment proceeds and government benefits to individual accounts, which were collectively worth more than $4 trillion.

The ACH system, which was initially created 30 years ago, has been completely network-based since the mid-1980s. Errors are rare—in 2001, the network had a returned transaction rate of one tenth of 1%, according to NACHA—and the majority of those were caused by clerical errors in entering account numbers for new direct-deposit recipients. "The biggest problem is getting the first one right," says Michael Herd, spokesman for NACHA.

But that only makes this error stand out more.

"We only hear about a few of these a year," says Herd. "This was more extensive than most of the ones we hear about."

"It's very rare," says Phil Holmes, a senior vice president at WesPay, the West Coast regional ACH association, adding that he couldn't remember a similar event occurring at any member organization. He stressed that the error did not occur in the ACH network itself but was internal to the bank.

Craig Woker, a financial analyst with Morningstar Inc., says this incident was unfortunate because it affected bank customers directly. This hurt a bank that already had made the need to improve customer service and satisfaction as one of its most important initiatives, he says. "Everybody has a bank horror story, but over the past couple of years a disproportionate number of those seem to have come out of BofA."

Bank of America claims the error was discovered Friday night during a routine quality check. The Los Angeles Times put the number of accounts affected at 1.1 million. Russell, however, said the true number was "significantly lower."

Bank of America says most customers responded positively to its efforts to correct the problem as quickly as possible and there is no indication that customers canceled their accounts en masse as a result.

Bank of America won't say why the "operating error," as it calls it, happened, or what measures it has since taken to ensure the error doesn't recur. Fortifying its systems for handling direct deposits will be looked at, but may not even be necessary. "Our systems are reliable, and we handle a very, very large volume of direct deposits," Russell says.

That the incident stood out has much to do with the bank's scale—and general reliability.

"This is not something I would normally expect Bank of America to do," says Jerry Thurman, a Los Angeles-based bank systems consultant. "In my opinion, they dropped the ball." The ACH system contains enough checks and balances that most errors are caught and corrected before bank customers ever become aware of them.

Occasionally, an employer or other institution may neglect to send a batch of transactions, and the bank personnel who ought to question the absence of a regularly scheduled transaction fail to notice, Thurman says. "Usually, it's more of human error than a technology error." What makes this glitch unusual is that it wasn't confined to a single employer or other payer but instead affected a wide variety of account holders who were expecting an automatic deposit that Friday.

Among the possible culprits: A breakdown in the company's processes, such as failure to adequately check that deposits were credited to the proper accounts; or an unexpected crash of an internal system that processes deposits. "If you have a shutdown with end-of-night processing so only a portion of the transactions get posted, then you do lose sort of random transactions," Thurman says.

Compounding the problem, though, was the bank's 36-hour response, instead of 36 minutes. Banking systems typically make heavy use of mirroring and offline replication of data, ensuring that a good copy of data can be quickly brought online after a system failure—sometimes within minutes. But if the data was never entered into the system, there may have been no backup to roll back to. In that case, the bank's operations team would have had to find the last processed ACH transaction and then process additional transactions from there, verifying each to ensure that no duplicate transactions were entered. Additionally, the bank would have to override and remove any bounced check fees or other debits from affected accounts caused by the error.

Just finding the culprit can be time consuming. "As things become increasingly automated, and the applications are more complex, it becomes harder to locate the problem," says Dana Stiffler, an analyst at AMR Research. "It's a very common problem—it's difficult not just to detect an error, but to figure out where it sits within a system."

Safeguards in the ACH system, however, mean Bank of America customers were never in danger of losing their money, says Thurman. Every node in the network that handles an ACH transaction file is responsible for keeping a backup copy, a precaution that does a lot to ensure that it will always be possible to reconstruct lost transactions. "It's very difficult to conceive of a situation whereby that data could not be recovered and the people get their funds correctly," Thurman says.

David F. Carr David F. Carr is the Technology Editor for Baseline Magazine, a Ziff Davis publication focused on information technology and its management, with an emphasis on measurable, bottom-line results. He wrote two of Baseline's cover stories focused on the role of technology in disaster recovery, one focused on the response to the tsunami in Indonesia and another on the City of New Orleans after Hurricane Katrina.David has been the author or co-author of many Baseline Case Dissections on corporate technology successes and failures (such as the role of Kmart's inept supply chain implementation in its decline versus Wal-Mart or the successful use of technology to create new market opportunities for office furniture maker Herman Miller). He has also written about the FAA's halting attempts to modernize air traffic control, and in 2003 he traveled to Sierra Leone and Liberia to report on the role of technology in United Nations peacekeeping.David joined Baseline prior to the launch of the magazine in 2001 and helped define popular elements of the magazine such as Gotcha!, which offers cautionary tales about technology pitfalls and how to avoid them.

Submit a Comment

Loading Comments...
eWeek eWeek

Have the latest technology news and resources emailed to you everyday.