Disaster Recovery: Two Steps Forward, One Step Back
To withstand a disaster and the resulting disruption, businesses must be able to depend on the cooperation of their employees. That presumption is the fundamental belief behind a well-thought-out business-continuity plan, which, in theory, should not only account for the recovery of IT systems, but also the resumption of overall operations.
This philosophy has evolved over several decades, but some business-continuity and disaster-recovery experts believe it is in danger of devolving.
“If we go back, way back, in history it was disaster recovery in the IT perspective that started the whole thing, and that was were the focus was; so when you said the term disaster recovery, you thought about protecting your technology,” says Barney Pelant, principal with Barney F. Pelant and Associates, a disaster recovery consultancy. “It was not until the ’80s when people started to recognize that it wasn’t just an IT issue, it was a business issue. So we started driving this from a business perspective for a while, and the term changed from disaster recovery to contingency planning to business-recovery planning to business-continuity planning to business-continuity management.”
Despite of all the progress that has taken place within the typical enterprise in unifying business-side and IT-side business-continuity efforts, there has been a regression recently that is so pernicious that even the terminology has reverted back to the old days.
“What happened in that whole process is [that while] it came together, it’s now come back apart again,” he explains. “It is unfortunate, but what has happened is, business continuity does not intend to imply technology strategy, and technology is back to the terminology of disaster recovery. There is this gap that is getting wider and wider.”
The gap isn’t just in terminology, either. As David Sarabacha, western region leader for Deloitte & Touche’s Business Continuity Management services group, puts it, many technology executives tasked with disaster recovery efforts aren’t communicating at all with those in charge of the rest of the business continuity planning.
“I’ve got a few clients who are experiencing that exact thing, where I can talk to the CIO and have one conversation and then talk to the head of the business-continuity management program that might sit in the risk group, and they don’t even talk to each other,” he says.
Bad Things Happen in Bunches
“It’s not a trend that anybody in the industry would like to see,” Sarabacha says of the growing gap between traditional IT disaster recovery and overall business-continuity management.
As he and other experts explain, when bad things happen to a company, it is almost always forced to mobilize all its recovery efforts at once, from the business and the technical side.
“Business continuity is quite often labeled an IT problem, but it isn’t. It’s a business issue,” says Roy Illsley, analyst with Butler Group.
There are so many interdependencies between business operations and IT that risk managers must communicate with IT disaster-recovery experts in order to prepare appropriately, Illsley explains. For example, he often sees out-of-touch planners from the business side just assume that they will be able to count on their employees to work from home in the event of a disruption—without ever consulting IT.
“But if your capacity on the network is only allowing certain people to get in, you can’t get everybody working from home,” he says. “So, you’ve got to think of technology supporting what the business wants to do in the event of a disaster so that they can work.”
Cooperation beforehand between IT and forward-thinking executives on the business side can be the make-or-break difference between staying afloat after a disaster or drowning in insolvency, Illsley says. One of the best ways to sell a comprehensive program is to look at a case where things were done right, he notes.
“Grolsch is a prime example of an organization that could have completely gone out of existence if they didn’t have a full business-continuity and IT disaster-recovery plan in place,” Illsley says.
In May 2000, after an explosive fire at a neighboring fireworks factory set Grolsch’s primary brewery ablaze, the Dutch company was able to rely on a plan that not only had alternate IT systems back up and running in 48 hours, but also put the logistics to place to replace production with output from a secondary brewery and prearranged factory partnerships with other brewing companies.
“This example demonstrates that having plans to cover all of your major business processes is important, and that sometimes they are all needed together at once,” according to a Butler Group paper on the Grolsch disaster. “The plans must cover technology, people and buildings, and be documented so that, in the event of a full-scale or partial disaster, the people responsible have them in hand when making decisions.”
Not Meeting Recovery-Time Objectives
As the Grolsch example makes clear, integrating IT disaster-recovery and business-continuity planning is critical. So why is the gap between the two widening?
Business-continuity experts say the tightening economy is partly to blame. However, they believe bridging the divide between technical and business expertise is also a matter of motivation.
“I think the consistent problem is that the folks who are tasked with developing business-continuity management programs often have little to no technical skills,” Deloitte & Touche’s Sarabacha says. “The person in charge of disaster recovery just gets tired of explaining and discussing technical aspects with that person and kind of gives up.”
No matter what the reason, the biggest way the gap can negatively affect an organization is by skewing RTOs (recovery-time objectives). According to Sarabacha, many enterprises are falling short when it comes to meeting RTOs. This is partly due to a lack of realistic expectations that stems from scarcity of communication between IT and business-side managers.
Often, a business will base its RTOs solely on the capabilities of its IT department to get things back up and running after disruption. These organizations fail to consider the logistics of resuming other business processes and operations, setting unrealistic objectives and failing to prioritize based on realistic expectations, notes Pelant, the disaster-recovery consultant.
“We’re finding that recovery-time objectives now have gotten so short that only IT can meet them. The business side can’t meet them,“ he says, explaining that some businesses might set a recovery objective of one hour for a certain business function. “But if you think about an hour’s lapsed time, the time interruption evacuating the building, collecting your thoughts, and then relocating to an alternate site and then setting up and starting to renew operations, it’s hard if you’ve only got an hour to do that. There’s a real lag there on the business side in that if you really are going to effectively do that, you really have to have some type of an alternate operation in place─not an alternate site, but an alternate operation that can pick up the workload in that time frame, and that’s not happening.”
Sometimes, it isn’t just the business side that lags, either. According to Sarabacha, business leaders who are out of the loop might also set unrealistic expectations that IT staffers cannot meet.
“We have several clients we help with testing of their disaster-recovery plans, and almost without exception, the results come out to be three, four and five times the duration expected,” he says. “For example, we’ve got one where they [estimated a] 12-hour RTO for their critical ERP [enterprise resource planning] application, and it took 66 hours to bring it back up. That’s something a business can understand. They might not understand why it took 66 hours, but they can understand that now they have got a problem.”
Business-Impact Analysis is the Key
Pelant and Sarabacha agree that the key to integrating IT with overall business-continuity planning and maintaining realistic RTOs is in running an effective business-impact analysis.
Just the act of getting IT disaster-recovery experts together with business-management and business-continuity planners to run an analysis will start cranking the gears of cooperation, Pelant says.
“By going through that process of doing the business-impact analysis and by understanding the interdepencies of the business, you start to realize that you depend upon other people,” he says. “So if you do the impact analysis right and you really focus on the fact that one cannot define one’s own importance, it will start to break down those silos.”
Pelant and Sarabacha believe that many organizations breeze through the business-impact analysis without putting enough thought into process priorities.
“It never ceases to amaze me that’s where real-time recovery objectives really come from,” Sarabacha says. “The industry touts business-impact analysis, … looks at each of these critical business processes and determines how long those can be done without the system and how long they can be completely not done after a disaster event. And that should drive the requirements for IT. Inevitably, I talk to the CEOs and the head of IT disaster recovery, and they say, ‘well, we kind of took a look at it, basically, and said 12 hours sounds pretty good, 24 hours sounds pretty good,’ and there’s no real linkage to what the business really needs.”
Even organizations that do collect data based on priorities are doing it wrong, Pelant says. “Many business-impact analyses are really opinion surveys because nobody really wants to go through the work of really analyzing priorities,” he says. “What you do is, you send out a questionnaire and you ask somebody how important are you and how fast do you need to get back online? And so you get the wrong answer, and it’s like the old IT axiom ‘garbage in, garbage out.’ You can analyze that kind of information as much as you want, but it’s still garbage.”
The best way to start the analysis is to conduct interviews with business managers and be sure to insert an IT disaster-recovery expert at each interview, Sarabacha says.
“One thing we’re doing with a client right now that I think is very positive is, they have their IT disaster-recovery leader sitting in on each and every one of the BIA [business-impact analysis] interviews with the business-process owners. So they’re really able to provide feedback on the systems and applications that support that process as you’re hearing from the business side. As opposed to keeping it all siloed, [they’re] really bringing IT and business to the table at the same time as you’re doing the analysis. So as the results come out, there’s much less time to buy in, and everything is more consistent on the first round.”
Throughout the process, Sarabacha encourages IT staffers to be more disciplined about opening up system data and explaining application interdependencies to business leaders who use these applications and must prioritize their need for an individual application in an emergency situation.
“As you’re doing this process, inconsistencies in the way things are done and what things are called are one of the big problems. Some companies, for instance, have pet names for their applications, and the one person in IT who really knows what [they are] knows that there are actually six applications behind that,” Sarabacha says. “The person in the business just knows it as the pet name and doesn’t really know what they’re talking about. That whole discussion around their requirements is kind of vague and uncertain because nobody knows what they’re talking about. And if IT can provide data from reports and other systems on what each application is, what the feeders are, and offer data-flow diagrams, that would be incredibly helpful to the business side.”
The business-impact analysis process is also a time to think creatively about how to utilize existing operations and infrastructure to improve recovery time with an acceptable budget.
“It takes creativity because you’re not going to reduplicate a department to be standing by waiting to pick up the pieces in case your primary department is interrupted,” Pelant says.
Instead, some organizations are looking at triangulating multiple offices or sites so that one can act as a backup for the other without incurring extra cost.
“It gets away from looking at it as insurance and actually building business continuity into the operational activities of the organization,” Sarabacha says.