Trial by Fire

 
 
By Samuel Greengard  |  Posted 2012-06-04
 
 
 

By Samuel Greengard

Over the last few years, more than a few organizations have discovered how swiftly and profoundly a disaster can impact business. One of the worse tornado outbreaks in history rampaged across the Midwest and South regions of the United States in the spring of 2011, causing an estimated $27 billion in damage.

In addition, massive and damaging earthquakes have struck Haiti, Chile, New Zealand and Japan. The earthquake in Japan, along with a deadly and damaging tsunami, knocked out some companies for weeks and brought business to a halt.

These days, disaster recovery (DR) and business continuity (BC) are not abstract concepts, as the cost of a business disruption can run into millions of dollars. What's more, it can lead to residual problems, including a tarnished reputation and brand.

Unfortunately, "Many organizations are not entirely prepared for disasters," points out David Sarabacha, global leader for resiliency at Deloitte Consulting. “They have inadequate systems and processes in place to deal with an interruption when it occurs.”

As data volumes explode and instant access to systems becomes the standard for conducting business, there's a growing need to position disaster recovery and business continuity at the center of IT. Organizations must build an effective strategy that's flexible enough to deal with today's fast-changing business environment. They must understand risks and potential fallout from downtime for various systems and data classifications, and they must look at new and emerging technologies, including the cloud.

There is no simple path to effective DR and BC. "Getting business continuity and disaster recovery right is a big challenge," states Bob Laliberte, senior analyst for Enterprise Strategy Group. "An organization must understand the risks it faces and how different systems and approaches create an optimal level of protection."

Designing a best- practice DR and BC strategy can prove challenging even for the most tech-savvy businesses and IT leaders. Today, mission-critical data streams in from a growing array of sources, including connected business partners and mobile devices used by employees and customers, so effective DR and BC touches almost every corner of the enterprise.

A starting point for developing a strategy, Deloitte’s Sarabacha says, is to thoroughly understand risk levels associated with business events and data. In many instances, as organizations reduce business partners and consolidate supply chains, risk actually increases because an organization is more dependent on data and systems tied into a fewer number of companies. In addition, many industries—health care, pharmaceutical and financial services among them—must adhere to a growing spate of regulations. For many organizations, staying current with rapidly changing business conditions is an enormous challenge.

 

Trial by Fire

Protecting data and ensuring that systems operate during a disaster is critical for Quarles & Brady, a law firm with more than 1,000 employees and nine offices in four states, as well as the District of Columbia and Shanghai, China. Two of its offices are located in Florida and have been subjected to hurricanes. Overall, the law firm has approximately 350 terabytes of data and the volume is growing by about 35 percent annually, according to Rich Raether, manager of network engineering.

In the past, if a problem occurred, Quarles & Brady relied on attorneys and staff located in offices to make backups of key files and send them via FedEx to its data centers. "It was not effective and it wasn't robust," Raether says. But with close to 500 applications in use, protecting data and avoiding downtime is paramount.

As a result, the company adopted a platform based on Dell EqualLogic SANs to store data from nearly 400 virtualized Windows-based servers (95 physical servers) scattered across the company. It now replicates data between two data centers in Milwaukee and Phoenix. In addition, it has a redundant Multiprotocol Label Switching (MPLS) network in place.

The environment has provided enormous benefits. When the IT department recently tested the system, it completed the fail-over process in about 45 minutes. And when a tropical storm hit the Naples, Fla., office a few years ago, it took about an hour and 15 minutes to get the office up and running again.

No less important: Quarles & Brady is able to push its DR plan and processes out to individual offices automatically. The result has been a 93 percent improvement in local recovery point objectives and 12-fold improvement in the speed of recovery for lost files. The firm also has achieved a 10-fold improvement in time to provision virtual desktops.

Testing systems and ensuring that they can meet real world needs is at the center of an effective DR and BC strategy, Deloitte’s Sarabacha notes. "Vendors’ claims about resiliency and theoretical numbers about recovery time mean very little,” he says. It is critical to validate a system within the context of realistic conditions. He adds that it's wise to stress and test systems using "war gaming" exercises that throw different variables into the picture. "Only then is it possible to understand how to enhance systems and build the best possible solution," he concludes.

Building on Success

The nature of business continuity is changing. For years, organizations mostly performed operational backups on a seven-to 14-day rotation, with monthly, quarterly and annual data slotted into an archive. Then SANs arrived on the scene and made it easier to store, replicate and restore data across a network. Over the last few years, these capabilities have grown and SANs have become far more sophisticated. Now a growing number of companies are turning to the cloud to address DR and BC challenges.

Graniterock, a 112-year-old construction and construction materials company headquartered in Watsonville, Calif., is among the organizations embracing the cloud. It operates 22 locations, including a chain of retail stores, in Northern California.

Altogether, the firm has five physical servers and approximately 150 virtual servers. These systems access data centers in Seattle and Denver through MPLS and AT&T networks. The company runs its enterprise resource planning (ERP) system and other applications in the cloud. "This greatly reduces the risk during an earthquake, fire or other disaster," notes CFO and CIO Steve Snodgrass.

The company turned to Velocity Technology Solutions to create a cloud-based framework for DR and BC. It also relies on SAN storage devices to store data internally and to provide redundant e-mail and file backups. "If systems go offline, the user community will have no clue that a failure has taken place,” Snodgrass says, because “the system takes constant snapshots of the data, stores it offsite and creates redundancies."

Graniterock prioritizes which systems come up first after a disruption and uses tape backups as the system of last resort. "They are there only in the case of an extreme failure," he explains.

 

No Margin for Error

Classifying data and understanding its value throughout the entire business life cycle is critical. As recovery time objectives (RTO) and recovery point objectives (RPO) have shrunk, organizations are increasingly pressed to build out systems that leave no margin for error … or data loss.

Deloitte’s Sarabacha says that, within today's environment, another key to success is an ability to avoid silos of data that frequently reside in legacy systems. Too often, "Some of the data winds up ignored,” he says. “In order to build an effective disaster recovery strategy, it's important to look at data in a holistic way."

That's certainly the goal at Central Carolina Community College in North Carolina. The school has a spate of mission-critical data to protect, including human resources files, student records, databases, payroll data, financial aid information and other records. Altogether, this data totals upward of 16 terabytes.

The college relies on two FalconStor continuous data protection (CDP) and network storage servers (NSS) to provide replication across the network. It also has outsourced email into the cloud. "If someone deletes a file or wipes out the database, we have historical records and snapshots and can retrieve the data," says MontE Christman, associate director of IT.

Two SANs connected to the FalconStor system provide instant and seamless switchover capabilities. The systems replicate every hour. In a worst-case scenario, Christman says, the college would lose anywhere from an hour to a day's worth of data, depending on its assigned value. The school periodically tests the system by deleting the primary database and then restoring it from a replicated system.

"We have moved away from a system where each server had to be backed up every night to one where everything is automated," he explains. “We are now in a much better position to survive a disaster or system failure.”

Quick Recovery

An effective disaster recovery and business continuity strategy is more than the sum of its technology. Not only is it essential to understand RPO and RTO objectives and have a sound DR plan in place, it is critical to update the plan and systems periodically to take into account changing technology along with real-world risks. As organizations migrate to an always-up scenario that focuses on 24/7/365 information availability, the stakes and the amount of planning required for DR and BC grow significantly.

Deloitte’s Sarabacha advises organizations to pay particular attention to mobile devices. For one thing, they can easily become pools of detached data. "There is a lot of valuable information residing on phones and tablets,” he points out, “and the loss of this data can cause significant problems for an organization."

For another, mobile phones can become a lifeline for an organization in the event of a disaster or major disruption. Yet, "Even if cell towers are knocked out and communication is down for an extended period, it's possible for employees to follow a disaster plan—if the organization has pushed it out to employees on their mobile devices,” Sarabacha adds.

In the end, an enterprise must embrace a strategy that encompasses systems and devices in a holistic way—and across the entire value chain. Sarabacha warns that a lack of integration can prove debilitating during a disaster.

He recommends mapping and analyzing infrastructure, data stores, applications and usage patterns "very precisely" and developing a detailed document or spider diagram that can serve as the basis for designing DR and BC systems. At the same time, it's critical to have manual systems and processes in place in case systems fail or the electricity goes out.

To be sure, the stakes have never been higher, but the tools and technologies to keep systems running have never been better.

"Business continuity and disaster recovery are nothing less than insurance policies," Laliberte of the Enterprise Strategy Group concludes. "At the end of the day, you want to know that you can keep systems running and have essential data available no matter what's thrown at them."

 

10 Steps to Better Business Continuity

    • Identify and map out the types of risks you're likely to face.
    • Understand how a disruption will affect business operations and data use.
    • Develop a comprehensive plan for managing data—in both digital and paper formats.
    • Adopt storage and backup systems based on RTO and RPO requirements.
    • Ensure that a copy of all data resides offsite, and build redundant systems and communications networks to the extent possible.
    • Build security into all processes.
    • Check logs and backup data to ensure that systems are working correctly, and s tress-test the systems periodically.
    • Establish a plan for how staff will communicate—including a phone or text tree—if an incident takes place.
    • Review and update your disaster recovery and business continuity plans annually.
    • Identify an alternative site or a way for staff to work from home and share files and data if an office is damaged or unavailable.