<img alt="dcsimg" id="dcsimg" width="1" height="1" src="//www.qsstats.com/dcs8krshw00000cpvecvkz0uc_4g4q/njs.gif?dcsuri=/index.php/c/a/Projects-Networks-and-Storage/Crash-at-365-Main-Street/1&amp;WT.js=No&amp;WT.tv=10.4.1&amp;dcssip=www.baselinemag.com&amp;WT.qs_dlk=XROBrlRgnTs1B9ZRgcDfdQAAAAY&amp;">

What Caused the Outage

By Doug Bartholomew  |  Posted 2007-10-01 Print this article Print

When the data center at one of the country's biggest co-location facilities experienced an unprecedented power outage, bringing nearly half its customers down with it, the entire industry learned some painful but powerful lessons.



The root cause of the outage turned out to be highly unusual: The data center's $1.2 million diesel-powered generators had gradually fallen out of sync with the electronic controllers that start them. "This was a truly rare incident," Staten says.

Unlike centers with batterystarted backup generators, 365 Main's data center has a continuous power system. Energy from the local utility flows into the system to operate its generators, which supply power to the building. In the event of a power failure, the system normally restarts using energy stored in each generator's flywheel. The flywheels, basically large spinning discs, keep turning long enough after a power failure to restart the diesel engines.

With its 10 backup diesel-powered generator units, 365 Main's primary data center had operated without a glitch through numerous power outages since its construction in the spring of 2001. The building has eight data rooms, each with its own dedicated generating unit. There are also two extra units, ready to kick in if one of the eight dedicated backup generators fails.

As it turned out, on July 24, four of the diesel engines failed to start, causing three computer rooms to lose power. "We could have failed three units, and through an automatic load-shed of chillers and air conditioning units, we could have continued to function," says Jean-Paul Balajadia, senior vice president of operations at 365 Main. In other words, the facility had enough backup generators to run the computer rooms if only three units had gone down. But with four unable to start and keep running, up to 45% of the building's computer systems shut down.

As soon as the failure occurred, Balajadia and his staff called the manufacturer of the power generators, Hitec Power Solutions. They also called Cupertino Electric, the engineers and project manager for the building's construction.

After several days, they determined the cause to be a discrepancy in the engines' start-up routine. Over the years, as each engine was periodically tested and shut down, the engine's digital controller would record the exact position of the pistons in the cylinders when they stopped so that, on the next start-up, fuel would be injected at the precise moment. "The controller writes this into memory at zero RPM, reading the information and then clearing out the prior memory," Balajadia explains.

When the engines were first shipped from the factory, it took seven seconds to 10 seconds for the engine to come to a complete stop, at which point the pistons' positions would be recorded for the next startup sequence. This is critical, because if the fuel is injected at the wrong time, the engines won't start. But over several years, during which time 365 Main had accumulated more than 1,000 hours of operation on the diesels, the engines had been fully broken in, so their shut-down time had increased to as much as 13 seconds.

Still, in the controllers' memory from the last shutdown, the pistons were recorded as being several seconds out of position, because each digital controller is calibrated to initiate the next restart based on the position of the pistons only seven to 10 seconds after the last shutdown. That variance of three or four seconds had caused four of the diesel engines to be out of their normal starting sequence, so they misfired, failing to start and keep running.

"This was completely unique—it was a true bug," Kelly says. The fix was to adjust the controller to allow more time between the shutdown and the reset command. The company implemented the fix not only at its San Francisco facility but also at its El Segundo, Calif., data center, which had the same Hitec generators containing the identical controllers.

Hitec reports that only about 100 such engines were shipped in 2001 with this particular Detroit Diesel controller, and that the other companies using them in data centers have had their controllers fixed as a result. Newer diesel generators have a more sophisticated ignition sequence. "We had only two other sites that used these engines as extensively, and both customers had reported isolated incidents where single engines failed to start," says John Sears, marketing and sales manager at Hitec Power Solutions in Stafford, Texas.

Next page: How To Guard Against Massive System Failures

Doug Bartholomew is a career journalist who has covered information technology for more than 15 years. A former senior editor at IndustryWeek and InformationWeek, his freelance features have appeared in New York magazine and the Los Angeles Times Magazine. He has a B.S. in Journalism from Northwestern University.
eWeek eWeek

Have the latest technology news and resources emailed to you everyday.