Skip to main content

Designing Your Disaster Recovery Plan

In this guide, we are going to break creating a disaster recovery plan into to the basics, so you can begin to protect your mission critical workloads as quickly as possible.

Designing Your Disaster Recovery Plan

Disaster Recovery (DR) is one of those areas in IT that is often ignored until it is needed. Unfortunately, by the time an organization needs to use their DR plan it is already too late. If we have learned anything in the IT space in the last several years it is that a disaster recovery plan is absolutely critical with the rise in cyber incidents and ransomware attacks.

In this guide, we are going to break creating a disaster recovery plan into to the basics, so you can begin to protect your mission critical workloads as quickly as possible.

Understanding Your Organization's Needs

Before a disaster recovery plan can be created, you need a fundamental understanding of your organization's needs. What are the most critical applications within your organization? What kind of downtime can they withstand? What is the cost of downtime?

This is all part of what is called a Business Impact Analysis (BIA). By identifying the critical systems and the impact of them going down we can begin to plan for recovery. Prioritization is also essential in this phase of DR planning, because let’s face it, creating a disaster recovery plan can be a daunting task for any organization.

At the end of this phase, you will come away with a prioritization of what order applications need to be recovered in and the recovery objectives for each application. The recovery objectives we examine for disaster recovery are Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO is the maximum acceptable length of time that your application can be offline. This is essentially how quickly you need to recover. RPO is the point in time you need to recover to or how much data can be acceptably lost.

To learn more about how to perform a BIA, the National Institute of Standards and Technology (NIST) as many helpful resources:

Designing the Right DR Solution

Once the recovery objectives are known for each application, we can begin to design a DR solution. Not all solutions are created equal, and we will end up protecting and recovering applications differently in order to meet their recovery objectives.

Achieving our RPO is a function of how often the data is protected and achieving the RTO is a function of how fast it can be recovered. For critical applications, we can expect lower RPO and RTO numbers. The good news is that with today’s technology, there are many creative ways to meet these objectives.

How to Achieve Low RPOs and RTOs

There are many technologies and solutions that can be used to achieve RPOs and RTOs, and you may already have solutions you are used to using that you will continue to use. You may also decide you need to examine new solutions to meet your recovery objectives.

Here are some things to keep in mind when you are choosing a solution, or deciding if an existing solution will work.

To meet low RPOs and RTOs, replication is often the most popular solution. Replication happens at regular intervals, and your data is ready for you in the event of a disaster. This means recovery is as fast as starting up your application.

This speed of protection and recovery does come at a cost, twice the amount of resources since essentially you have your production copy and a recovery copy.

However, that does not mean it is the only solution. You may be able to achieve low RPOs and RTOs with very aggressive backups and fast storage for your backups to land on. You will still need capacity to run your applications in the event of a disaster.

How do you figure out if a solution will meet your recovery objectives? Through testing, which can be difficult when it comes to a production environment that does not yet have a recovery environment.

Equinix Metal is a great way to test a disaster recovery solution. Within minutes you can have a brand new environment stood up and ready for testing. It is also a great way to test a new backup solution without impacting a production environment. You can even build parallel backup environments to do a side by side comparison when testing new solutions.

You can see how easy it is to get started with Equinix Metal here.

Metal is also a great proof of concept environment for DR since it is so separated from your production environment. Since a complete new environment is being deployed, it is a great way to catch any application dependencies that may have been overlooked when it comes to critical applications.

When you deploy a test environment with Equinix Metal, the power of bare metal hardware is at your fingertips on demand. As you can see from the deployment screen, there are many different types of hardware and operating systems available, with transparent pricing:

select type selection

In many cases, virtualization can be a powerful tool to streamline testing. You can see how to deploy virtualized environments in Metal with these helpful guides:

Choosing Where to Recover

Now that we know how to meet our organizations requirements, the big question becomes where are you recovering these applications to? Many organizations have begun to move away from the traditional two site model and favor alternative recovery locations.

While there are many reasons for this, there are two to really consider in today’s climate: cost and security. Maintaining a second data center is very expensive and once a malicious actor has compromised a production environment. Chances are they will also compromise the DR environment, rendering it useless for recovery.

This is another area Equinix Metal really shines in. Equinix Metal is a completely separate environment that is off site from your production data center. Beyond using metal to validate and test disaster recovery solutions you can set up a hot, warm, or cold DR solution to meet your organization’s needs.

Here is what these solutions might look like:

  • HOT: Use reserved hardware to make sure you always have the capacity you need in Metal when you need it.
  • WARM: Have enough hardware deployed in metal to begin recovering workloads, and scale up with on demand hardware.
  • COLD: Head to Metal when you need to recover.

In the case of a disaster recovery design where you are not reserving hardware, you can use the Metal API to check capacity in any Metro. While reserved hardware will provide additional cost savings, one of the best parts of Metal is the transparent predictable pricing so you will always know how much your disaster recovery environment will cost.

Equinix Metal is everywhere you need to be, and available in the following locations:

equinix locations

Documenting, Testing, and Implementing a DR Plan

Once we have an idea of what we are going to do to recover our environment, it is time to determine how we are going to accomplish this on disaster day. This means we need to document what we need to recover after a disaster step by step.

Once we document our disaster recovery plan, we need to test it to ensure it works as it is expected to. One thing testing also uncovers is if there were any changes to the environment that were not reflected in the documentation.

Remember, the goal of a DR test is not to pass the test, but be able to meet recovery objectives. There is nothing wrong with encountering errors or issues during a DR test, since this provides an opportunity to fix things.

Automation is also a key factor when it comes to disaster recovery. Let’s face it, humans make mistakes, especially when they are humans that are under tremendous pressure to recover applications after a disaster.

Once there is a fundamental understanding of how applications will be protected and recovered, and validation has been performed that RPOs and RTOs can be met, automation can be a powerful tool to make testing and recovery even more simple.

DR is an Iterative Process

DR is an iterative process. It is about picking those first few applications to start with in order to get processes like data protection and recovery with locked in. Beyond the technical how-to of DR, it is also important to solidify processes around DR documentation creation and updates, as well as DR testing.

The trick is to start small, and build up. Choose an application and get an understanding of all of the processes from beginning to end. This makes it much simpler to create DR plans for subsequent applications.

Developing a DR plan can seem complicated and overwhelming, but the cost of not having one in place is far higher. With the increase in cyber threats and ransomware attacks (not to mention the potential for more classic types of disasters - natural disasters or other unforeseen events) having a fully tested and working DR plan is essential for every organization.

Once you have a fundamental understanding of your organization's needs, and your recovery objectives for each application, you can begin to decide the technical solutions that will make these objectives possible to meet.

Equinix Metal can be an asset to any disaster recovery plan. From using Equinix Metal to test production and recovery solutions, to using Equinix Metal as a recovery location, you can leverage the flexibility and capabilities of bare metal hardware without the hassle of maintaining a data center.

Because Equinix Metal is so flexible and adaptable, you can easily build out an environment to fit your DR strategy no matter what it is.

Remember that your DR plan does not end with its creation. Once your plan is documented, it is crucial to test it regularly to ensure it performs as expected and adapts to any changes in your environment. Automation can play a significant role in this, reducing human error and streamlining DR proceses.

As your business evolves, so will your DR requirements. Continual assessment, testing, and updating of your DR plan is not just a good practice; it's a necessity in today's dynamic digital landscape. As your requirements change and evolve, you can easily adapt your Eqiunix Metal environment along with them.

In the end, the goal is not just to recover from a disaster but to build resilience into your organization. We need to to ensure that when a disaster occurs, organizations are ready to respond quickly and effectively. With careful planning, the right tools, and a commitment to ongoing improvement this goal can be achieved.

Last updated

07 August, 2024

Category

Tagged

Article