Updated on 2025-08-08 GMT+08:00

Overview

With the transformation from traditional IT infrastructure O&M to cloud service O&M, traditional O&M methods face challenges such as complex inter-service invoking, fast application iteration, massive O&M objects, and complex non-linearity systems. Service downtime will bring huge economic losses and reputational damage to the company.

Chaos engineering is introduced to the O&M process. Through periodic simulation, system weaknesses (such as software bugs, solution design defects, and fault recovery process points) can be identified before problems occur on the live network, and system availability problems can be detected and resolved in a timely manner, continuously improve application resilience and build O&M confidence. For unavoidable scenarios (such as hardware faults, abnormal server power-off, and network device board faults), formulate a contingency plan for quick fault recovery in advance.

COC allows you to perform automatic chaos drills covering from risk identification, emergency plan management, fault injection, and review and improvement. Based on years of best practices of Huawei Cloud SRE in chaos drills, customers can proactively identify, mitigate, and verify risks of cloud applications, continuously improving the resilience of cloud applications.

Chaos Drill Function Introduction Video

Image and Disruptor Version Support Statement

Currently, the chaos drill feature supports probe attack objects such as Elastic Cloud Servers (ECSs), FlexusL instances, and Bare Metal Servers (BMSs), and provides corresponding resource and network disruptors for you to drill. Probe disruptors include disruptors for practicing, host resources, host processes, and host network modules. By integrating disruptor modules and functions, you can accurately simulate faults in the actual environment and detect system availability issues as early as possible, continuously improving application resilience.

The following table lists the ECSs, FlexusL, and BMSs image versions and supported probe tools.

CentOS 6.10 images and earlier versions do not support some probe disruptors because the system does not have the shared libraries (GLIBC_2.14 and GLIBCXX_3.4.15) required for running corresponding probe packages.

Table 1 lists the probe disruptors supported by each ECS image version.

Table 1 ECS and disruptor compatibility list

Disruptor

Supported Image Version

Description

Disruptors for practicing

Qualifying practice

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Host resources

CPU usage increase

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Memory usage increase

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Disk usage increase

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Disk I/O pressure increase

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Host process

Process ID exhaustion

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

The process IDs of the EulerOS image are used up. The protection mechanism may be triggered, causing the kernel to restart. As a result, the drill fails.

Killing a process/Continuously killing a process

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Host network

Network latency

CentOS 7.2, CentOS 7.6, CentOS 7.9, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Network packet loss

CentOS 7.2, CentOS 7.6, CentOS 7.9, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Network error packets

CentOS 7.2, CentOS 7.6, CentOS 7.9, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Duplicate packets

CentOS 7.2, CentOS 7.6, CentOS 7.9, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Network packet disorder

CentOS 7.2, CentOS 7.6, CentOS 7.9, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Network disconnection

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

NIC break-down

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

If the NIC becomes faulty, the UniAgent may go offline. As a result, the UniAgent information cannot be received, and the page fails to be displayed.

DNS tempering

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Port occupation

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Server disconnection

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

If the server becomes fault, the UniAgent may go offline. As a result, the UniAgent information cannot be received, and the page fails to be displayed.

NIC bandwidth limiting

CentOS 7.2, CentOS 7.6, CentOS 7.9, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Connection exhaustion

CentOS 7.2, CentOS 7.6, CentOS 7.9, CentOS 8.2, Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04, Ubuntu 22.04, EulerOS 2.2, EulerOS 2.5, EulerOS 2.9, EulerOS 2.10, Debian 8.2.0, Debian 8.8.0, Debian 9.0.0, Debian 11.1.0, and Huawei Cloud EulerOS 2.0

-

Table 2 lists the probe disruptors supported by each BMS image version.

Table 2 Bare metal server image and tool compatibility list

Disruptor

Supported Image Version

Disruptors for practicing

Qualifying practice

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Host resources

CPU usage increase

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Memory usage increase

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Disk usage increase

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Disk I/O pressure increase

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Host process

Process ID exhaustion

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Killing a process/Continuously killing a process

CentOS 7.4, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Host network

Network latency

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Network packet loss

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Network error packets

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Duplicate packets

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Network packet disorder

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

Network disconnection

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

NIC break-down

CentOS 7.3, CentOS 7.9, Ubuntu 16, Ubuntu 1804, and EulerOS 2.3

DNS tempering

CentOS 6.9, CentOS 7.9, Ubuntu 16, Ubuntu 1804, EulerOS 2.3, and EulerOS 2.9

Port occupation

CentOS 6.9, CentOS 7.9, Ubuntu 16, Ubuntu 1804, EulerOS 2.3, and EulerOS 2.9

Server disconnection

CentOS 6.9, CentOS 7.9, Ubuntu 16, Ubuntu 1804, EulerOS 2.3, and EulerOS 2.9

NIC bandwidth limiting

CentOS 6.9, CentOS 7.4, CentOS 7.9, Ubuntu 16.04, Ubuntu 18.04, EulerOS 2.3, and EulerOS 2.9

Connection exhaustion

CentOS 7.4, CentOS 7.9, Ubuntu 16.04, Ubuntu 18.04, EulerOS 2.3, and EulerOS 2.9

Table 3 lists the probe disruptors supported by each FlexusL image.

Table 3 FlexusL instance images and probe tool compatibility list

Disruptor

Supported Image Version

Disruptors for practicing

Qualifying practice

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Host resources

CPU usage increase

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Memory usage increase

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Disk usage increase

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Disk I/O pressure increase

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Host process

Process ID exhaustion

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Killing a process/Continuously killing a process

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Host network

Network latency

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Network packet loss

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Network error packets

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Duplicate packets

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Network packet disorder

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Network disconnection

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

NIC break-down

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

DNS tempering

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Port occupation

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Server disconnection

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

NIC bandwidth limiting

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0

Connection exhaustion

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, and Debian 11.1.0