Components of a Disaster Recovery Plan

Posts by Grace Norman

Maintaining an optimistic outlook is good, but Murphy’s law still applies: “Anything that can go wrong, will go wrong”. You can prepare your infrastructure for some events such as natural disasters or planned power outages, but you cannot predict each and every threat.

Disasters come in different shapes, vary in their magnitude, and are inevitable in a modern IT landscape. According to Evolve IP,approximately one in three companies has suffered from at least one incident or outage that required them to implement a disaster recovery strategy. According to the same survey, hardware failure and power outages remain the top causes of IT downtimes. Almost 60% of companies that have experienced disaster have suffered financial losses, with 12% of them losing $100,000 or more.

Disasters do not exclusively translate into severe financial losses but also involve other implications such as data loss, productivity loss, a damaged reputation, customer dissatisfaction as well as retention issues, and much more. Moreover, around 40% of businesses fail to reopen at all after a disaster.

In the today’s competitive landscape, businesses with high dependence on IT cannot afford any downtime. Customers expect your business to be up and running 24/7, and could switch to your competitors at any time if you fail to ensure the continuous delivery of services. For this reason, securing your business from disasters as well as timely mitigating their consequences is of essence.

By implementing a proper disaster recovery strategy, having resources and tools in place, as well as paying close attention to DR planning, you can reduce the risk of falling prey to an unplanned event.

Key Elements of a Disaster Recovery Plan

A disaster recovery plan is at the core of any disaster recovery strategy. A well-devised DR plan is a step-by-step instruction that ensures faster recovery and helps mitigate (or avoid) devastating consequences that may occur in the event of a disaster. Organizations that do not have a DR plan in place are not able to promptly respond to an incident, and are consequently putting themselves in danger of losing the whole business.

Disaster recovery is a complex procedure that includes multiple steps and elements that should be described in a DR plan. The following are the most crucial components to be taken into account in your company’s disaster recovery planning:

Documentation

Disaster recovery documentation should list all vital components of your IT infrastructure – both hardware and software, a responsible team, as well as a sequence of measures that need to be taken in order to resume business operations. The documentation should be kept current and up-to-date to comply with all of the changes that take place in you IT infrastructure.

Given that, this is a comprehensive document covering multiple areas and components. If not documented, these components can be missed or overlooked in a state of panic, leading to greater losses. Well-structured and thoroughly devised disaster recovery documentation allows for shorter recovery time, since the only thing you have to do in the event of a disaster is to follow a predetermined set of actions.

Scope and Dependencies

Your recovery scope should not necessarily include the entire IT infrastructure, because not all components are equally critical in order to ensure continuous business operation. Determine the most important VMs and include them into your recovery scope to achieve shorter recovery time objectives. These are VMs housing business-critical information, applications, and IT systems.

Also, consider dependency links between these VMs, applications, and IT systems. For example, the operation of a particular application can be dependent on information housed on a different VM or vice versa. Dependencies also exist between your employees and the components of your infrastructure (e.g., the logistics department depending on the information processed by the financial department and so on). Figure out and document such dependencies so that your staff can continue their work with minimal interruptions.

Responsible Team & Staff Training

Your disaster recovery plan should clearly define key roles and the people who are responsible for coordination of disaster recovery activities. Communicate the plan to all of your employees and make sure everyone understands who is responsible for what in order to eliminate the risk of confusion, redundancy, as well as delays in the recovery workflow. In case of disaster, your employees should know whom to contact or where to start in order to timely launch the recovery process.

Secondary Location Configuration

Secondary location is your guarantee that you have hardware and software resources, as well as tools for recovery. Make sure your DR site does not share the same location with your primary production site to avoid the DR site being knocked offline by the same disaster.

Your secondary location should have enough space, hardware, as well as software resources to accommodate your staff and sustain transferred workloads. Pay close attention to CPU, memory, disk capacity, and network bandwidth since the shortage of these resources can result in insufficient VM performance.

Setting the RTO and RPO

The recovery time and recovery point objectives are metrics that are closely associated with recovery. RTO determines how long your business can go without a specific VM, system, or application running. RPO dictates how much data your business can afford to lose without hurting your business operations. In a perfect world, RTO and RPO should be as close to zero as possible. To many businesses, however, this is a costly luxury that might not justify itself.

The good news is that depending on how critical specific VMs are for your business, you can establish different RTOs and RPOs for each of them, setting the tightest objectives only for the most important ones. Consequently, VMs housing customer-facing applications won’t tolerate long downtime or data loss, and should be set at a zero objectives; while VMs with administrative applications can withstand some downtime or data loss.

Testing and Optimization

A DR plan that has not been tested cannot be considered effective. Merely having a disaster recovery plan does not suffice because once you test it you will find out its weaknesses and inconsistencies. And you probably want to learn about these weak spots before disaster hits. That is why rigorous DR testing is a critical step, which helps you gain confidence that you won’t fail your recovery attempts in the face of actual disaster.

Optimization is also integral to the success of your disaster recovery planning. Replacements, enhancements, and upgrades take place in your IT infrastructure on a regular basis; it is important to keep your DR plan consistent even with the slightest change. Again, don’t forget that your plan has to be tested upon each alteration in order to avoid giving disaster any chance of prevailing.

Automation

DR solutions allow for the automation of the entire disaster recovery process, from failover to failback operations. Automation frees IT managers from manual burden and reduces complexity, meaning that the recovery process takes less time, is not prone to human error, and provides for non-disruptive DR testing. Thus, thanks to automation, in addition to saving time, you also save money since downtimes are reduced to a minimum.

NAKIVO’s Site Recovery for Effective Disaster Recovery Planning

In its comprehensive set of features, NAKIVO Backup & Replication v8.0 now includes the Site Recovery functionality, which allows you to recover VMware, Hyper-V, and AWS EC2 environments from disasters in just a few clicks. This is a powerful DR automation tool that is designed to simplify disaster recovery with custom-built DR workflows. With Site Recovery, you can also test your disaster recovery workflows to make sure they are effective and can deliver the desired objectives.

DR workflows (i.e., Site Recovery jobs) are sets of actions that can vary in complexity, depending on the required purposes and needs. You can include up to 200 actions to a single job, including failover, failback, start or stop VMs and instances, run or stop jobs, run script, attach or detach repository, send email, wait, and check condition.

There are two modes to run Site Recovery jobs: test and production. Test any of your jobs to verify their validity and efficiency beforehand – this way you should be able to run them smoothly in a production mode when disaster strikes. Actions such as failover, failback, start/stop VMs (instances), and attach/detach repositories are reversed upon job completion, bringing your environment back to its initial state. Site Recovery tests can be run automatically on schedule, while there is no such option in the production mode, which only allows manual launch.

Among the key advantages of NAKIVO’s Site Recovery are flexibility, ease of use, and, most importantly, cost. The possibility to create multiple Site Recovery jobs of different complexity levels allows you to be prepared for any disaster scenario.

Site Recovery functionality is available from the same interface that our users are already familiar with, and requires no additional time to master. You simply construct a DR workflow by combining actions from the list inside the solution, and execute it by hitting a button when necessary.

With NAKIVO Backup & Replication, you get an all-in-one solution that comprises traditional data protection and disaster recovery features. Furthermore, this solution has few to none competitors on the market in terms of pricing.

Conclusion

No company is immune to disaster and, unfortunately, a very small percentage of disruptive events can be foreseen before they occur. The good news is that you can strengthen you DR preparedness so that you are able to promptly respond to an incident and mitigate its consequences. While a disaster recovery plan is not a guarantee for recovery, by having one in place you significantly increase the odds of fast recovery and minimal losses.

NAKIVO Backup & Replication allows to create automated disaster recovery workflows that reduce dramatically the complexity of DR planning, improve disaster recovery preparedness, and achieve tighter RTOs.

Download the Full-Featured Free Trial to test NAKIVO Backup & Replication in your own environment.