What Caused the Outage
By Doug Bartholomew | Posted 2007-10-01When the data center at one of the country's biggest co-location facilities experienced an unprecedented power outage, bringing nearly half its customers down with it, the entire industry learned some painful but powerful lessons.
?">WHAT CAUSED THE OUTAGE
The root cause of the outage turned out to be highly unusual: The data center's $1.2 million diesel-powered generators had gradually fallen out of sync with the electronic controllers that start them. "This was a truly rare incident," Staten says.
Unlike centers with batterystarted backup generators, 365 Main's data center has a continuous power system. Energy from the local utility flows into the system to operate its generators, which supply power to the building. In the event of a power failure, the system normally restarts using energy stored in each generator's flywheel. The flywheels, basically large spinning discs, keep turning long enough after a power failure to restart the diesel engines.
With its 10 backup diesel-powered generator units, 365 Main's
primary data center had operated without a glitch through
numerous power outages since its construction in the spring of
2001. The building has eight data rooms, each with its own dedicated
generating unit. There are also two extra units, ready to
kick in if one of the eight dedicated backup generators fails.
As it turned out, on July 24, four of the diesel engines failed
to start, causing three computer rooms to lose power. "We could
have failed three units, and through an automatic load-shed of
chillers and air conditioning units, we could have continued
to function," says Jean-Paul Balajadia, senior vice president of
operations at 365 Main. In other words, the facility had enough
backup generators to run the computer rooms if only three units
had gone down. But with four unable to start and keep running,
up to 45% of the building's computer systems shut down.
As soon as the failure occurred, Balajadia and his staff
called the manufacturer of the power generators, Hitec Power
Solutions. They also called Cupertino Electric, the engineers
and project manager for the building's construction.
After several days, they determined the cause to be a discrepancy
in the engines' start-up routine. Over the years, as
each engine was periodically tested and shut down, the engine's
digital controller would record the exact position of the pistons
in the cylinders when they stopped so that, on the next start-up,
fuel would be injected at the precise moment. "The controller
writes this into memory at zero RPM, reading the information
and then clearing out the prior memory," Balajadia explains.
When the engines were first shipped from the factory, it
took seven seconds to 10 seconds for the engine to come to a
complete stop, at which point the pistons' positions would be
recorded for the next startup sequence. This is critical, because
if the fuel is injected at the wrong time, the engines won't start.
But over several years, during which time 365 Main had accumulated
more than 1,000 hours of operation on the diesels, the
engines had been fully broken in, so their shut-down time had
increased to as much as 13 seconds.
Still, in the controllers' memory from the last shutdown,
the pistons were recorded as being several seconds out of position,
because each digital controller is calibrated to initiate the
next restart based on the position of the pistons only seven to
10 seconds after the last shutdown.
That variance of three or
four seconds had caused four of
the diesel engines to be out of
their normal starting sequence,
so they misfired, failing to start
and keep running.
"This was completely
uniqueit was a true bug," Kelly
says. The fix was to adjust the
controller to allow more time
between the shutdown and the
reset command. The company
implemented the fix not only
at its San Francisco facility but
also at its El Segundo, Calif., data
center, which had the same Hitec
generators containing the identical
controllers.
Hitec reports that only about 100 such engines were shipped
in 2001 with this particular Detroit Diesel controller, and that
the other companies using them in data centers have had their
controllers fixed as a result. Newer diesel generators have a more
sophisticated ignition sequence. "We had only two other sites
that used these engines as extensively, and both customers had
reported isolated incidents where single engines failed to start,"
says John Sears, marketing and sales manager at Hitec Power
Solutions in Stafford, Texas.
Next page: How To Guard Against Massive System Failures
Discover Software-Defined Networks
Software-defined networks hold a lot of potential in today’s ...Watch Now
Programming Languages That Pay Big Bucks
SQL is a programming language that's high in demand, thanks to its ...Watch Now







