RES01-01 High-Availability Deployment of Application Components

All components in an application system must be deployed in high availability (HA) mode to prevent SPOFs.

Risk level
High

Key strategies
The HA deployment solution varies depending on the capabilities of each component in the application system.
- Using native HA instances: If a cloud service supports both single-node resources and active/standby or cluster resources, the key nodes must use active/standby or cluster resources, such as CCE HA clusters, RDS primary/standby instances, DDS clusters, and DCS master/standby or cluster instances. Multiple workloads need to run on CCE clusters to prevent service interruptions caused by a single node failure.
- Single-node instances achieve HA through multiple instances: If a cloud service only supports single-node deployment, active/standby or load balancing among multiple nodes needs to be supported at the application layer. For example, for the ECS service, you can use ELB to enable load balancing and failover of stateless services among multiple ECSs. Alternatively, set up active/standby ECSs at the application layer.
- Hardware-dependent instances achieve HA at the application layer: If an ECS uses hardware resources, such as local disks, FPGA passthrough, and IB NIC passthrough, a hardware failure will cause the ECS to fail, and the ECS cannot be automatically recovered through the VM HA function. To address this issue, avoid using hardware resources when designing the application system. If they have to be used, implement HA at the application layer so that services can be quickly recovered if the dependent hardware is faulty.
- VM HA: If an ECS does not depend on special resources, VM auto recovery is supported. If the physical server accommodating the ECS is faulty, the ECS can automatically restart on another physical server. Workloads deployed on such an ECS must support auto recovery upon VM restart and be able to tolerate temporary performance degradation or interruptions during the HA process.
To reconstruct a deployed application system for HA, perform the following steps:
1. Determine the key components of the application system. If a key component is faulty, the entire application system or key functions will be affected.
2. Check the HA capability of key components. Specifically, check whether failover is supported if a key component is faulty.
3. Optimize the key components that do not support HA as follows:
- If a cloud service instance is a single-node instance, for example, the ECS service, you can apply for multiple ECSs to bear the same service and use ELB to implement load balancing and failover. Alternatively, enable failover of multiple ECSs at the application layer to achieve HA.
- Failover is available for ECSs that do not depend on special resources. If the physical server accommodating an ECS is faulty, the ECS can automatically restart on another physical server. For workloads deployed on such an ECS, you need to check whether services can be automatically recovered after the ECS is restarted.
- Failover is unavailable for ECSs that depend on special resources, such as local disks, FPGA passthrough, and IB passthrough. Check whether these ECSs can be replaced with common ECSs (that do not depend on these special resources) to improve the availability.
- Local disks have a limited service life and are prone to faults after long-term use. You are advised not to attach local disks to ECSs, BMSs, and MRS clusters. Instead, use EVS disks with HA. If local disks must be used, you are advised to use RAID to improve the availability of local disks and implement HA at the application layer. In this way, if an instance is faulty, failover is supported and services can be recovered.
Related cloud services and tools
- Elastic Cloud Server (ECS)
- Bare Metal Server (BMS)
- Elastic Load Balance (ELB)
- Cloud Container Engine (CCE)
- Document Database Service (DDS)
- Distributed Cache Service (DCS)
- MapReduce Service (MRS)