Today’s systems are not as simple as they used to be. With the advent of a service-oriented architecture permeating our systems, the dependencies reach far from a single system. The old days of single server/client architecture are gone, and it takes more effort to understand the more complex infrastructures. With that, today’s disaster recovery planning isn’t just about assigning personnel and decided on a failover facility, it is also an exercise in understanding your entire IT infrastructure.

Cloud Disaster Recovery

Most companies have a mix of commercial off-the-shelf applications and custom or semi-custom applications that have been updated and modified over time for the environment. The question is, do you know what your applications need to survive? Too often, customized code is victimized by poor documentation and changing scope. When the time comes to recover an application, what if you’ve come to realize that that application was dependent on some small Web service in your environment that was not deemed critical at the time? The entire disaster recovery can become a failure due to the oversight, regardless of the careful planning that went into the effort beforehand.

What you must do is understand your systems at much lower level than servers and networks. Following the ITIL model and creating a configuration management database can help you document the detail of your most critical line of business applications. Documenting the service dependencies is especially important in this effort. If you don’t understand the service contracts and the responsibility assignment of those services, you’ll likely reach a point in the processing of a transaction where an assumed service is not available, breaking the entire functionality of the application.

Next, you must prioritize those services. Identify those critical to data integrity, line of business applications, and anything that would prevent you from making money, servicing customers, or fly in the face of regulatory compliance. As you are prioritizing, begin to understand the requirements for your recovery point objective, which is the point in which you will recover your application and data, as well as define your recovery time objective, which is the time is takes to stand up the service. These are not just questions for IT, but also for key business consumers who depend on these apps.

When you have your RPO/RTO discussion, be sure you avoid the business’ natural instinct to put unrealistic expectations of instantaneous recovery with no data loss onto your entire infrastructure. Disaster recovery isn’t like flipping a switch. Once you have a realistic definition of recovery point objective and recovery time objective, you can begin to put your applications and services into specific categories.

The key to a successful disaster recovery is documenting the details of your services and infrastructure. The more effort you put into that documentation and subsequent prioritization of recovery time objective and recovery point objective for your organization, the better you can plan and handle troubleshooting during an actual disaster recovery.