It’s summer across the U.S., and that means that hurricanes, tornadoes, floods, wildfires, powerful thunderstorms and other natural disasters can take out your company’s IT systems in a flash.
As a result you no doubt have disaster recovery plans and procedures in place for your company’s IT systems and critical enterprise applications. However, those capabilities are just the start.
Once built, those plans, from off-site data centers hosting your critical applications to data backups that are available at the flick of a switch when needed, have to be maintained, updated and tested on a regular basis so that they absolutely, positively can be relied on in a real disaster.
For many IT organizations, that kind of testing is often non-existent. Without scheduled reviews and testing, your disaster recovery efforts could themselves be a second disaster waiting to happen.
“I know that it’s really scary for organizations to reach over and flip a switch to power down a server that’s running in production, but that has to be done once in a while to check out a disaster plan,” says Daniel M. Kusnetzky, principal analyst at Kusnetzky Group LLC. “The last thing that you want to have is a plan that hasn’t been tested before a disaster.”
A key reason for testing your procedures, Kusnetzky said, is that if they don’t go as planned it’s certainly better to know that when the power is on, your staff is onsite and your business isn’t being threatened by a true natural disaster.
Your IT staff needs to be able to ensure that in an emergency they can get critical business applications up and running, as well as all the connected systems on which those apps rely, Kusnetzky says. “It isn’t just the applications, but also the complete configuration that the application is used to running on. It all needs to be there or it will require a reconfiguration of the processes or the application itself.”
The only way to know if it will all works as designed is to test it, push it and test it some more, he says.
“Just copying the application somewhere and just turning it on won’t necessarily make it available to your company’s workers,” Kusnetzky says.
This is an instance where virtualization could be helpful, because if the workload is running inside one or more virtual machines, then they are not reliant on hardware differences within your disaster recovery strategy, according to Kusnetzky. For those same reasons, using virtual storage could be beneficial.
At the same time, virtualization won’t solve all related disaster recovery complications. “Virtualization can help, but like anything else, it’s not a panacea,” he says. “It’s just a tool. You need to do other things as well.”
Dan Olds, an analyst with Gabriel Consulting Group, says that a good way to approach disaster recovery testing inside an enterprise is to do it one system at a time to minimize the impact on your IT staff and procedures.
“It’s not only testing your [disaster recovery] vendors to be sure that it can all be done in an emergency, but it’s teaching your people how to do it,” Olds says. “It’s getting that knowledge so they’re not scared to death about it if something happens. You’ve got to get comfortable with it and it’s better to get comfortable when you have help and when it’s not time- sensitive and do-or-die, like when the floodwaters are rising outside.”
Typically, this kind of detailed testing is the thing that customers don’t do, he says, and it’s a bad decision to leave it out. “You’ve got to test the applications. You have to have the guts to do it.”
Olds stresses that it is important to keep in mind the difference between redundancy and availability when it comes to your enterprise’s data and applications in an emergency. “You want to have all of your data protected all of the time so that no matter what happens, you never lose it, other than the last half hour or so of data.”
But at the same time, if disaster strikes, you don’t need to access all of that data immediately. You have to have quick and sure availability only to the data that is mission critical for the business as you recover from the emergency, he says.
“You have to prioritize there,” Olds says. “It would be silly and needlessly expensive to have your entire infrastructure mirrored so that you could instantly recover every application. The vast majority of businesses don’t need that kind of availability.”
One way to do this is to make a list of the applications and data that your business needs to have first in an emergency, followed by the apps and data that would be nice to have at that point. The remaining apps and data can return later, when the disaster is over.
Meanwhile, don’t forget to get other input inside your company when making these kinds of decisions. “IT leaders need to be sure that they are bringing in the business side on this stuff,” he says. “You want to make sure that what IT thinks should be brought back in first gets agreement from the people on the business side, too.”
By testing your emergency plans regularly, you can ensure that no critical steps are left out, such as network topography details and your company’s IP addresses. “These are all things that you won’t necessarily think about in an emergency,” Olds says. “You need to be able to move this all over and mirror it so it is all available outside your company infrastructure if it is out of service. If your data center is under water for a while, you need to have a long range plan.”
Once you start this kind of testing, you need to remember to keep it up, especially when you make new changes to applications and to your hardware infrastructure, he says.
The reason is simple – you have to be sure that the system changes you make on a regular basis don’t interfere with your existing disaster recovery systems and cause them to fail at the worst possible moment.
“You always have to make sure that you can recover an application with the right data and the right everything else,” he says. “That’s the key. I recommend that you make it a check box on your application update procedures, so you can determine if the application or system changes require any related changes to a backup recovery plan. Right after your plan is golden, you need to be able to document those changes and constantly check them to be sure they will still work.”
If you leave out this step, then your disaster recovery plans are incomplete and likely won’t work when they are needed, which defeats the whole purpose of having them.