FAA Infrastructure: Anatomy of a Flight-Plan System CrashBy Chris Preimesberger | Posted 2008-10-14 Email Print
Re-Thinking HR: What Every CIO Needs to Know About Tomorrow's Workforce
Transitioning off of legacy systems is never easy, but it’s especially challenging if you are an agency of the U.S. government such as the FAA (Federal Aviation Administration). Real progress on a next-generation system is being made, but you wouldn’t necessarily know it if you read some news headlines about FAA system failures this year. Beyond being a nuisance to airlines and travelers, experts and former employees of the FAA are calling flight-plan system failures a warning sign for peril.
FAA Infrastructure: Anatomy of a Flight-Plan System Crash
The 90-minute system crash on Aug. 26, which pretty much affected all the major airports in the nation, later was blamed on a single corrupt file—most likely a virus—that had entered the system and somehow torpedoed it into uselessness.
"What happened yesterday at 1:25 p.m. [EDT] was that during a normal daily software load something was corrupted in a file, and that brought [the] system down in Atlanta," said FAA spokesperson Paul Takemoto. "Basically, all the flight plans that were in the system were kicked out. For aircraft already in the air, or [that] had just been pushed back from the gate, they had no problems. But for all other aircraft, it meant delays."
What made things worse was when operations were shifted to the backup facility in Salt Lake City, which is designed to handle 125 percent of the overall load, Takemoto said.
"It was far more than that [125 percent], because airlines were refiling their flight plans manually. They just kept hitting the 'Enter' button. So the queues immediately became huge," Takemoto said. "On top of that, it happened right during a peak time as traffic was building. Salt Lake City just couldn't keep up."
The second NADIN system in Salt Lake City, to its credit, continued normally in handling all the West Coast flight plans. But when Atlanta crashed, all the East Coast data switched over immediately to Salt Lake City, which could not handle the extra data traffic—even though it was designed to handle 125 percent of normal load in the event of an emergency.
Commercial aircraft of any type cannot take off with having filed a valid flight plan, one that includes destination, estimated flight speed, description of cargo, estimated altitude, weather conditions and a number of other data points.
So, for a part of the afternoon of Aug. 26, pilots at about 40 U.S. airports were forced to manually type their flight plan information into the system, causing long delays in takeoffs. Chicago's O'Hare International, one of the two or three busiest airports in the world, and nearby Midway Airport were among the most directly affected.
"We've just never seen it fail in this manner," Hank Krakowski, the chief operating officer for the FAA's Air Traffic Organization, said in his media remarks.
However, a look at the record shows it had indeed failed several times before, including only five days prior to the Aug. 26 crash. This excerpt comes from the FAA's own Web site (PDF format), dated Aug. 22:
"The aforementioned NADIN outage last evening [Aug. 21] caused more than 100 delays after flight plans were rejected. The outage is currently being blamed for 134 departure delays but this figure could climb. The legacy NADIN in Atlanta crashed. Salt Lake City took over but had problems with the high queue level …"