Disaster recovery, HA, RTO, RPO
Jump to navigation
Jump to search
Two important aspects of resiliency are high availability and disaster recovery.
- High availability (HA)
- is the ability of the application to continue running in a healthy state, without significant downtime. By "healthy state," we mean the application is responsive, and users can connect to the application and interact with it.
- Disaster recovery (DR)
- is the ability to recover from rare but major incidents: non-transient, wide-scale failures, such as service disruption that affects an entire region. Disaster recovery includes data backup and archiving, and may include manual intervention, such as restoring a database from backup.
One way to think about HA versus DR is that DR starts when the impact of a fault exceeds the ability of the HA design to handle it.
- Business continuity (BC)
- which is the ability to perform essential business functions during and after adverse conditions, such as a natural disaster or a downed service.
Systems resiliency Two important metrics to consider are the recovery time objective and recovery point objective.
- Recovery time objective (RTO)
- is the maximum acceptable time that an application can be unavailable after an incident. If your RTO is 90 minutes, you must be able to restore the application to a running state within 90 minutes from the start of a disaster. If you have a very low RTO, you might keep a second deployment continually running on standby, to protect against a regional outage.
- Recovery point objective (RPO)
- is the maximum duration of data loss that is acceptable during a disaster. For example, if you store data in a single database, with no replication to other databases, and perform hourly backups, you could lose up to an hour of data.
Resources
- Designing resilient applications for Azure Microsoft Docs