Overview
With the transformation from traditional IT infrastructure O&M to cloud service O&M, traditional O&M methods face challenges such as complex inter-service invoking, fast application iteration, massive O&M objects, and complex non-linearity systems. Service downtime will bring huge economic losses and reputational damage to the company.
Chaos engineering is introduced to the O&M process. Through periodic simulation, system weaknesses (such as software bugs, solution design defects, and fault recovery process points) can be identified before problems occur on the live network, and system availability problems can be detected and resolved in a timely manner, continuously improve application resilience and build O&M confidence. For unavoidable scenarios (such as hardware faults, abnormal server power-off, and network device board faults), formulate a quick recovery emergency plan in advance.
COC allows users to perform automatic chaos drills covering from risk identification, emergency plan management, fault injection, and review and improvement, Based on years of best practices of Huawei Cloud SRE in chaos drills, customers can proactively identify, mitigate, and verify risks of cloud applications, continuously improving the resilience of cloud applications.
Image and Weapon Version Support Statement
Two types of attack targets, including bare metal servers (BMSs) and Flexus L instances, are added to COC chaos drills, and corresponding resource and network weapons are provided for users to drill. By integrating weapon modules and functions, you can accurately simulate faults in the real world environment and detect system availability issues as early as possible, continuously improving application resilience.
The following table lists the BMS and Flexus L image versions and supported probe tools.
CentOS 6.10 images and earlier versions do not support probe tools because the system does not have the shared libraries (GLIBC_2.14 and GLIBCXX_3.4.15) required for running probe packages.
Table 1 lists the probes supported by each BMS image version.
Weapons |
Supported Image Versions |
|
---|---|---|
Resource weapon |
Increased CPU Usage |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
Memory stress |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
|
Disk stress |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
|
Disk I/O stress |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
|
Process ID exhaustion |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
|
Killing a process/Continuously killing a process |
CentOS 7.4, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
|
Network weapon |
Network latency |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
Network packet loss |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
|
Error packets |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
|
Duplicate packets |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
|
Packet disorder |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
|
Network disconnection |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
|
NIC down |
CentOS 7.3, CentOS 7.9, Ubuntu16, Ubuntu 1804, EulerOS 2.3 |
Table 2 lists the probes supported by each Flexus L image version.
Weapons |
Supported Image Versions |
|
---|---|---|
Resource weapon |
Increased CPU Usage |
CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
Memory stress |
CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
|
Disk stress |
CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
|
Disk I/O stress |
CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
|
Process ID exhaustion |
CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
|
Killing a process/Continuously killing a process |
CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
|
Network weapon |
Network latency |
CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
Network packet loss |
CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
|
Error packets |
CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
|
Duplicate packets |
CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
|
Packet disorder |
CentOS 7.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
|
Network disconnection |
CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
|
NIC down |
CentOS 7.2, CentOS 8.2, Ubuntu 16.04, Ubuntu 22.04, EulerOS 2.0, Debian 8.2, Debian 11.1.0 |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot