When it comes to thinking about your organization’s backup plan for when your systems and applications go down, there are three terms that you are likely to hear a lot: disaster recovery, high availability, and fault tolerance.
While these terms are related to one another, it is worthwhile to know the differences between them and understand the role that they play in keeping your systems online. Today we’ll define what each term means, demonstrate what it looks like in action, and explain why it is important to the success of your organization.
Disaster Recovery (DR) refers to the set of policies and procedures in place to ensure the continuity and recovery of mission critical systems in the event of a disruptive event such as a power outage, flood, or cyberattack. In other words, how quickly can you get your computers and systems up and running after a disastrous event?
In healthcare, HIPAA requires that every organization has a disaster recovery plan in place for the backup and recovery of health data. Even if your office is not located in a geographical area where natural disasters such as earthquakes and hurricanes are common, events like a water pipe bursting in your server room or a fire in your building can happen to any organization at any moment. Furthermore, the cost savings of having a disaster recovery plan are enormous. According to a study conducted by the Multihazard Mitigation Council, every $1 spent on hazard mitigation, such as a disaster recovery plan, saves your organization roughly $4 in future benefits.
High Availability (HA) is the concept or goal of ensuring your critical systems are always functioning. In practice, this means creating and managing the ability to automatically “failover” to a secondary system if the primary system goes down for any reason as well as eliminating all single points of failure from your infrastructure. Like disaster recovery, high availability is a strategy that requires careful planning and the use of tools. Achieving a network uptime of 99.999% (commonly referred to as “five nines”, which equates to 5.26 minutes of downtime) should be your organization’s goal. Unlike with fault tolerant systems, there will always be some amount of downtime with high availability, even if it is only a few seconds.
Why is it important to have a well-thought out high availability architecture? Because the cost of downtime is high, whether we are talking about dollar figures (at just 5 hours of downtime per year, the estimated dollar cost is $1.3 million based on extrapolated data from the Aberdeen Group), the lost productivity of your staff resorting to manual processes, or the lives of your patients who are at risk every second that your systems are offline. An integration engine that is flexible enough to fit into any kind of high availability approach will allow your IT staff to implement an HA architecture as quickly and cost-effectively as possible.
Fault Tolerance describes a computer system or technology infrastructure that is designed in such a way that when one component fails (be it hardware or software), a backup component takes over operations immediately so that there is no loss of service. The concept of having backup components in place is called redundancy and the more backup components you have in place, the more tolerant your network is hardware and software failure.
For example, a single application running at the same time on two servers. The servers essentially mirror each other so that when an instruction is executed on the primary server, it is also executed on the secondary server. If the primary server crashes or loses power, the secondary server takes over with zero downtime. There are two small drawbacks of fault tolerance however; it is more costly because both servers are running all the time and there is a risk of both servers going down if there is a problem with the operating system that the servers are using.
Disaster recovery, high availability, and fault tolerance. With a better understanding of the significance of these terms and the differences between them, the next step is to evaluate your current backup plans. Is your disaster recovery plan thorough enough? Are failover options a part of your high availability architecture? How tolerant are your networks to failure? The answers to these questions will determine how prepared your organization is for when the unexpected happens.