When Failure Isn’t An Option

Every business backs up its critical data. But how quickly can employees access the backup, and how up-to-date will it be? Data replication software can help get systems back online in a few hours—or even in just a few minutes—to minimize the fallout from a failure.

At 2 p.m. one Sunday in the summer of 2003, disaster hit The Hotel Hershey. Nothing as dramatic as a fire or flood: An ordinary disk drive failed. But the drive crashed a key reservation system at the 232-room resort in Hershey, Pa., meaning hundreds of customer bookings were inaccessible—and possibly gone for good.

Fortunately, Andy Bomboy, manager of network technology at parent company Hershey Entertainment & Resorts, had anticipated such an event. The previous fall, Bomboy and his team installed NSI Software’s Double-Take data replication product. The software monitors changes entered into applications that run on Windows servers at the company’s hotels and entertainment venues, then copies those changes to eight IBM servers at its main building.

So when the reservation system crashed, Bomboy’s team had it back up and running on a Double-Take server an hour and a half later. “The key was, there was no lost data,” he says. “It had replicated up to the second.”

He’s lucky the outage hadn’t occurred a year earlier. Hershey Entertainment & Resorts (unaffiliated with the chocolate company) previously had no formal plan for restoring data or applications if one of the 25 main servers at one of its 10 facilities failed. The company generates about $200 million in annual revenue and has 6,500 employees at the peak of its summer season. It would stand to lose tens of thousands of dollars for each hour its reservation, payment and point-of-sale applications were inaccessible, or if information were completely lost.

For Bomboy, the terrorist attacks on Sept. 11, 2001, highlighted the importance of being able to recover applications quickly after a major catastrophe. But the prospect of more mundane failures—like a faulty disk drive—really triggered his decision to deploy data replication software.

“We realized we needed to have a procedure for this,” he says. “We can’t afford to have the front-desk reservation system go out.”

Data replication software is like having a mechanic in your car all the time, ready to step in with replacement parts the moment something breaks down. Traditional backup systems are more like tow trucks. With the latter, backup software writes data to magnetic tapes, usually on a nightly basis. That means any data that’s changed since the last backup isn’t copied. Storing backup data on tapes is cheaper than keeping it on disk, but tapes can take considerably longer to retrieve data—at least twice as long as disk drives.

By contrast, data replication software, which stores data on disks, sends changes to a set of information immediately to a backup site over a network. It also typically provides the ability to switch over to a standby server as soon as a primary one goes down.

Information-technology managers say the most important rule to follow with data replication software is: test, test and retest. “It’s not an ‘implement and forget’ solution,” says Aaron Huslage, senior system administrator at CNF, a logistics and freight-shipping company. “You have to do testing fairly often, if for no other reason to make sure you’re getting good copies.”

The odds a company will experience what the data-recovery industry calls a “smoking-hole disaster”—one that completely destroys a building—are slim. But the potential cost of totally losing data could bankrupt some organizations, spurring investments in preparing for worst-case scenarios.

Three years ago, Starwood Hotels and Resorts spent $1.5 million to build a completely redundant infrastructure for its SAP finance and accounting applications, which run on an IBM iSeries AS/400 system and an EMC Symmetrix storage system. Starwood uses EMC’s real-time replication software (which works only with EMC hardware) to send changes in the SAP system to a backup facility eight miles from its primary data center in Phoenix. Kevin Malik, a senior director of information systems at Starwood, says the backup SAP system could be brought up in a few minutes.

How did the company justify the expense? The hotel chain uses the SAP system to cut paychecks to 52,000 employees in North America. Malik says the cost of missing even one of Starwood’s seven monthly pay periods would be astronomical: For one thing, hotel labor unions could sue the company for breach of contract.

“The legal issues alone would outweigh the cost of a disaster-recovery solution,” Malik says.

The tab doesn’t always run so high. Last fall, Farmers & Merchants Bank paid $20,000 for Veritas Software’s Replication Exec package, which copies data from 20 branches to a Dell network-attached storage system at its headquarters in Long Beach, Calif. Previously, the bank relied on tape backup at each branch—and sometimes daily backup jobs would take eight hours. “We needed a better way to back up our data,” says Jerry Craft, the bank’s manager of network services.

The main reason for the difference in price between Starwood’s project and Farmers & Merchants Bank’s is scope. Starwood required a high-speed fiber-optic link (which can transmit data at 2 billion bits per second) and very fast backup disk and server systems that could instantly start processing about 1 million SAP transactions per day; Farmers & Merchants Bank was simply looking for a faster way to protect transaction data and employee files rather than bringing a failed application back online.

Still, the lower-priced option has paid off for Farmers & Merchants Bank, according to Craft. By eliminating the need for workers at each branch to perform nightly tape backup, he estimates that the bank will save the equivalent of the salary of one full-time information-technology employee. Says Craft: “Moving to replication instead of backup has been one of our most successful projects.”