Crash at 365 Main Street

By Doug Bartholomew  |  Posted 2007-10-01 Email Print this article Print
 
 
 
 
 
 
 

When the data center at one of the country's biggest co-location facilities experienced an unprecedented power outage, bringing nearly half its customers down with it, the entire industry learned some painful but powerful lessons.

At exactly 1:47 p.m. on July 24, miles kelly got the call every CIO and data center manager dreads: The data center had experienced a power outage. Indeed, a power surge had shut off energy to the company's primary data center in San Francisco, and four of the building's 10 backup generators had failed to start. Three computer rooms were down.

That would signal the start of a bad day for any enterprise, but for 365 Main, where Kelly serves as vice president of marketing and strategy, the problem was magnified many times over. That's because 365 Main isn't just any business: It's one of the nation's top data center managers, or co-location service providers. There are more than 75,000 servers in its 227,000- square-foot San Francisco facility, supporting hundreds of customers, including such high-profile companies as Craigslist, Sun Microsystems, Six Apart and the Oakland Raiders.

"When the failure of a data center becomes a bigger issue is when you have all these Web services and start-ups that have their data center services only at this one site," says James Staten, principal analyst in the infrastructure and operations group at Forrester Research.

When the four, 2.1-megawatt diesel engine-generator units failed to kick in as they should have, it was a disaster in the making for 365 Main. The company promotes itself as having "The World's Finest Data Centers," and clients rely on it for constant uptime. Prior to the incident, 365 Main could claim 100% uptime.

But on the afternoon of July 24, 40% to 45% of 365 Main's customers lost power to their equipment for about 45 minutes, Kelly says.

At Sun Microsystems, sites were down 45 minutes to three hours, with most being restored in about 90 minutes, according to Will Snow, senior director of Web platforms at Sun. Although the power was out for 45 minutes, it can take a few hours to bring systems and networks back up and ensure they're working properly.

Snow says Sun had a backup plan for services at 365 Main, but it wasn't complete. "We're in the process of changing our disaster recovery plans to deal with shorter outages," Snow says. "Originally our plans were tailored for more significant outages of four-plus hours, but now we're looking to respond to very short outages such as the San Francisco outage."

At Six Apart, four of the company's Web sites—LiveJournal, Vox, TypePad and Movable Type—were down 90 minutes. On LiveJournal, the company posted an apology, explaining that during outages it would normally display a message telling visitors about the status of the site. "But because this was a full power outage there was a period of time where we could not access or update a status page," the posting explained.

Fortunately, 365 Main was able to manually restart the generators that failed to kick in automatically, which allowed it to operate on backup power until PG&E began delivering a stable power supply.

Next page: What Caused the Outage?



123>
 
 
 
 
Doug Bartholomew is a career journalist who has covered information technology for more than 15 years. A former senior editor at IndustryWeek and InformationWeek, his freelance features have appeared in New York magazine and the Los Angeles Times Magazine. He has a B.S. in Journalism from Northwestern University.
 
 
 
 
 
 

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters



















 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Thanks for your registration, follow us on our social networks to keep up-to-date