Crash at 365 Main Street (
Page 1 of 3 )
When the data center at one of the country's biggest co-location facilities experienced an unprecedented power outage, bringing nearly half its customers down with it, the entire industry learned some painful but powerful lessons.At exactly 1:47 p.m. on July 24, miles kelly got
the call every CIO and data center manager dreads: The data
center had experienced a power outage. Indeed, a power surge
had shut off energy to the company's primary data center in San
Francisco, and four of the building's 10 backup generators had
failed to start. Three computer rooms were down.
That would signal the start of a bad day for any enterprise,
but for 365 Main, where Kelly serves as vice president of marketing
and strategy, the problem was magnified many times
over. That's because 365 Main isn't just any business: It's one
of the nation's top data center managers, or co-location service
providers. There are more than 75,000 servers in its 227,000-
square-foot San Francisco facility, supporting hundreds of customers,
including such high-profile companies as Craigslist, Sun
Microsystems, Six Apart and the Oakland Raiders.
"When the failure of a data center becomes a bigger issue
is when you have all these Web services and start-ups that have
their data center services only at this one site," says James Staten,
principal analyst in the infrastructure and operations group at
Forrester Research.
When the four, 2.1-megawatt diesel engine-generator units
failed to kick in as they should have, it was a disaster in the
making for 365 Main. The company promotes itself as having
"The World's Finest Data Centers," and clients rely on it for
constant uptime. Prior to the incident, 365 Main could claim
100% uptime.
But on the afternoon of July 24, 40% to 45% of 365 Main's
customers lost power to their equipment for about 45 minutes,
Kelly says.
At Sun Microsystems, sites were down 45 minutes to three
hours, with most being restored in about 90 minutes, according
to Will Snow, senior director of Web platforms at Sun. Although
the power was out for 45 minutes, it can take a few hours to
bring systems and networks back up and ensure they're working
properly.
Snow says Sun had a backup plan for services at 365 Main,
but it wasn't complete. "We're in the process of changing our
disaster recovery plans to deal with shorter outages," Snow says.
"Originally our plans were tailored for more significant outages
of four-plus hours, but now we're looking to respond to very
short outages such as the San Francisco outage."
At Six Apart, four of the company's Web sitesLiveJournal,
Vox, TypePad and Movable Typewere down 90 minutes. On
LiveJournal, the company posted an apology, explaining that
during outages it would normally display a message telling visitors
about the status of the site. "But because this was a full
power outage there was a period of time where we could not
access or update a status page," the posting explained.
Fortunately, 365 Main was able to manually restart the generators
that failed to kick in automatically, which allowed it to
operate on backup power until PG&E began delivering a stable
power supply.
Next page: What Caused the Outage?