Cause of DeathBy John McCormick | Posted 2004-03-04 Email Print
Re-Thinking HR: What Every CIO Needs to Know About Tomorrow's Workforce REGISTER >
Additional reporting by Berta Ramona Thayer in Panama
As software spreads from computers to the engines of automobiles to robots in factories to X-ray machines in hospitals, defects are no longer a problem to be managed. They have to be pred
As tragic as it is, the Panama incident does not stand alone. In all, Baseline has found no fewer than a half-dozen cases in which software has contributed to loss of life. (See Eight Fatal Software-Related Accidents, p. 47, Baseline, March 2004.)
At least three deaths were blamed on a software glitch that crippled the East Coast's power grid last summer. In 1997, the safe-altitude warning system at Guam International Airport inexplicably generated an excessive number of false alarms that planes were flying too low. As a result, air-traffic controllers cut back the distance scanned by the system from 54 miles to 1 mile. The change prevented controllers from warning pilots of Korean Airlines Flight 801 that they were flying toward a mountain. The crash killed 225. There are also scores of personal injuries in which software was at least partly to blame. A rider on a gyroscopically controlled Segway scooter suffered a head injury because of a software-design gap and, according to the National Highway Traffic Safety Administration, more than 476 people have been hurt because of a problem with a General Motors antilock braking system in use from 1991 to 1996. GM said the braking system wasn't designed to check for certain drive-train variables.
Certainly, deaths and injuries that can be in some fashion tied to software are statistically rare. Overall, software quality is "generally pretty good," says James Gosling, a Sun Microsystems vice president. But Gosling, regarded as the father of the Java programming language, which can be used to build applications that can run across diverse computers, says code-writing in many cases is still flawed.
Many specifications and designs aren't thought out well enough. Programmers, no matter how good, make logical mistakes. In addition, testing procedures often aren't rigorous enough, he says. And today, with so many software programs interacting with other software programs, there's no way to predict what will happen when two pieces of code come in contact with each other for the first time.
"The quality fight is never-ending," he says.
The threat of physical harm and crippled lives is escalating, now that software drives not just healthcare machinery, but our cars and our household appliances as well. It runs elevators and amusement-park rides. It controls just about every manufacturing plant, utility and business office in the country. As software becomes more pervasive, software quality-long a discussion confined to software-development circles-becomes an issue for business executives, product managers, factory floor supervisors and, as the physicists in Panama found out, anyone who uses software in the workplace.
"What can you do today without software?" asks Pradeep Khosla, head of the department of electrical and computer engineering at Carnegie Mellon University. "Nothing."
But all software has bugs.
For every thousand lines of code developed by commercial software makers or corporate programmers there could be as many as 20 to 30 bugs, according to William Guttman, the director of the SCC, a group of businesses and academic institutions looking for ways to make software more dependable. Many common programs have a million or more lines of code. Sun says its Solaris operating system has more than 10 million lines of code. Even a high-end cell phone can have 1 million.
In a one-million-line piece of code, even if you only have one bug per thousand lines, you're still going to have 1,000 bugs, says Michael Sowers, executive vice president at Software Development Technologies, a software-testing company.
In today's software, says Khosla, "you have to assume there are some bugs in the code."
Just look back at the first major case of code that killed, in healthcare.
The Therac-25 was one of the first "dual-mode" radiation-therapy machines, which meant that it could deliver both electron and photon treatments. Electrons are used to radiate surface areas of the body to kill cancer cells. A photon beam, normally called an X-ray, can be a hundred times more powerful and as a result is used to deliver cancer-killing radiation treatments deeper into the body. According to Prof. Leveson's account, the machine was "more compact, more versatile, and arguably easier to use" than its predecessor machine.
But, according to Prof. Leveson's 1995 book "Safeware" and other accounts, there were a number of flaws in the software that led to the Therac-25 radiation overdoses at health facilities in Marietta, Ga.; Tyler, Texas; Yakima, Wash.; and elsewhere. In all, three people died.
One of the problems manifested itself in 1986 when a physicist tried to change machine set-up data-such as radiation dosage and treatment time-that had been keyed into the software.
The machine went through a series of steps to set itself up to deliver either electrons or photons and the dosage of the selected beam. As data was given, the machine recorded the information and then followed the instructions.
In some cases, however, operators realized while setting up the machine that they had entered an incorrect piece of information. This could be as simple as unintentionally typing in an "X" for an X-ray (or photon) treatment instead of an "E" for an electron treatment.
In "fixing" that designation, an operator would move the cursor up to the "treatment mode" line and type in an "E." The monitor displayed the new entry, seemingly telling the operator that the change was made.
But in the case of the Therac-25, the software did not accept any changes while going through its eight-second-long set-up sequence. No matter what the screen might show, the software grabbed only the first entry. The second would be ignored.
Unaware the changes did not register, operators turned on the beams and delivered X-rays, when they thought they were delivering electrons. According to Leveson's account, patients received such incredibly high quantities of radiation that the beams burned their bodies. Patients who should have received anywhere from 100 to 200 rads of radiation were hit instead with 10,000 to 15,000 rads, in just one or two seconds. A thousand rads is a lethal dose.
The Therac-25, according to John Murray, head of software regulatory efforts at the FDA, was a "seminal event" for the agency. After the incident, the FDA for the first time turned its attention to the software that had begun to control medical devices.
The FDA has the power to inspect the work of manufacturers; to ask manufacturers to recall products; to have federal marshals seize products if a voluntary recall isn't done; and to ask the courts to issue injunctions against the distribution of products if a manufacturer does not have good manufacturing procedures in place.
To help software manufacturers, the FDA issues "guidance" documents that recommend that manufacturers follow generally accepted software-development standards; keep track of their design specifications; and conduct formal reviews and tests of the code they produce. Arne Roestel, Multidata's president, says the company followed the FDA recommendations.
But there are few specifics. According to the FDA's "General Principles of Software Validation," which went into effect in January 2002, "This guidance recommends an integration of software lifecycle management and risk management activities. Based on the intended use and the safety risk associated with the software to be developed, the software developer should determine the specific approach, the combination of techniques to be used, and the level of effort to be applied."
In the wake of Panama, some industry experts wonder if there's enough oversight of medical-device software-or, for that matter, software development in general. They say the time might be right for tougher regulation.
Software engineer Ganssle, for one, notes that programmers don't need any form of certification or license to work on commercial software, including life-critical medical device software. Yet, he says, "In Maryland, where I live, if you want to cut hair, you need to be licensed."
Besides the FDA, there are few federal agencies policing software-development practices. The Federal Aviation Administration oversees the flight-control software in commercial aircraft. The Nuclear Regulatory Commission (NRC) watches over the software that runs nuclear plants. And that's about it, for oversight of commercial software. The Occupational Safety and Health Administration, the Consumer Product Safety Board and other agencies charged with protecting factory workers, professionals and consumers say they don't worry about the quality of software in tools or toys.