Updated on 2024-11-20 GMT+08:00

Overview

With the transformation from traditional IT infrastructure O&M to cloud service O&M, traditional O&M methods face challenges such as complex inter-service invoking, fast application iteration, massive O&M objects, and complex non-linearity systems. Service downtime will bring huge economic losses and reputational damage to the company.

Chaos engineering is introduced to the O&M process. Through periodic simulation, system weaknesses (such as software bugs, solution design defects, and fault recovery process points) can be identified before problems occur on the live network, and system availability problems can be detected and resolved in a timely manner, continuously improve application resilience and build O&M confidence. For unavoidable scenarios (such as hardware faults, abnormal server power-off, and network device board faults), formulate a quick recovery emergency plan in advance.

COC allows users to perform automatic chaos drills covering from risk identification, emergency plan management, fault injection, and review and improvement, Based on years of best practices of Huawei Cloud SRE in chaos drills, customers can proactively identify, mitigate, and verify risks of cloud applications, continuously improving the resilience of cloud applications.

Image and Weapon Version Support Statement

Two types of attack targets, including bare metal servers (BMSs) and Flexus L instances, are added to COC chaos drills, and corresponding resource and network weapons are provided for users to drill. By integrating weapon modules and functions, you can accurately simulate faults in the real world environment and detect system availability issues as early as possible, continuously improving application resilience.

The following table lists the BMS and Flexus L image versions and supported probe tools.

CentOS 6.10 images and earlier versions do not support probe tools because the system does not have the shared libraries (GLIBC_2.14 and GLIBCXX_3.4.15) required for running probe packages.

Table 1 lists the probes supported by each BMS image version.

Table 1 Bare metal server image and tool compatibility list

Weapons

Supported Image Versions

Resource weapon

Increased CPU Usage

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Memory stress

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Disk stress

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Disk I/O stress

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Process ID exhaustion

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Killing a process/Continuously killing a process

CentOS 7.4, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Network weapon

Network latency

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Network packet loss

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Error packets

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Duplicate packets

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Packet disorder

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Network disconnection

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

NIC down

CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3

Table 2 lists the probes supported by each Flexus L image version.

Table 2 Flexus L instance images and probe tool compatibility list

Weapons

Supported Image Versions

Resource weapon

Increased CPU Usage

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Memory stress

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Disk stress

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Disk I/O stress

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Process ID exhaustion

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Killing a process/Continuously killing a process

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Network weapon

Network latency

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Network packet loss

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Error packets

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Duplicate packets

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Packet disorder

CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

Network disconnection

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0

NIC down

CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0