Updated on 2025-05-22 GMT+08:00

Application Resilience

Application resilience refers to an application system's ability to provide and maintain an acceptable level of service despite various abnormal scenarios. These include infrastructure faults (such as database exceptions), external attacks (like DDoS attacks exceeding preset traffic limits), external dependency failures (such as access timeouts or unavailability), and regional disasters (such as large-scale power outages and floods).

Resilience design aims to ensure that:

  • The system has a high-availability architecture, for example, there are no single points of failure (SPOFs).
  • The system can quickly recover from faults, such as data loss, device faults, or site faults.

Compared with traditional data centers, Huawei Cloud provides infrastructure and cloud services with high availability, auto scaling, automatic backup, cross-AZ disaster recovery (DR), and cross-region DR, enabling customers to build highly reliable systems.

  • EVS and OBS use distributed storage to prevent the impacts of hardware faults with a single disk, server, or rack.
  • RDS provides automatic data backup as well as cross-AZ and cross-region data replication and switchover.

However, even if application systems on the cloud platform have these high-availability capabilities, they still need to have the capability of recovering from various occasional faults.

  • The application systems need to be able to retry to re-establish links if hardware faults cause such links to instantaneously interrupt during a high-availability switchover or a cross-AZ switchover.
  • The application systems need to be able to control traffic if service overload occurs due to sudden external traffic bursts.
  • Some workloads heavily depend on hardware, such as local hard disks and GPUs. If the hardware is faulty, services are interrupted. In this case, the application systems need to build their own high-availability capabilities.

Different application systems have varying availability requirements, necessitating tailored resilience solutions to meet their specific needs.