Help Center/ Cloud Operations Center/ Getting Started/ Performing Chaos Drills on High CPU Usage
Updated on 2026-03-11 GMT+08:00

Performing Chaos Drills on High CPU Usage

COC is a secure, efficient, and one-stop intelligent O&M platform that meets customers' requirements for centralized O&M and management of multiple types of resources. The chaos drill function is the core capability module for COC to improve system resilience. It allows you to configure drill and attack templates as required, and perform fault injection drills on different types of infrastructure and application carriers, such as physical machines, VMs, and CCE containers, based on the templates. In addition, the built-in failure mode management capability lets you perform standard fault simulation operations on target instances. This helps check the system's fault tolerance capability and fault rectification efficiency when exceptions arise.

This section uses the typical fault scenario, high CPU usage of ECS, as an example to describe how to configure and perform a chaos drill on COC.

Operation Process

  1. Step 1: Synchronize Resources: Obtain resource data in all regions to which the current user belongs and synchronize the data to COC.
  2. Step 2: Install UniAgent: Install UniAgent on each desired node for information exchange between COC, lower-layer services, and hosts.
  3. Step 3: Create an Application: Manage the relationship between applications and cloud resources, and provide unified and timely resource environment management services for follow-up resource monitoring and automated O&M.
  4. Step 4: Create a Drill Task: Preset a drill solution for resources and enable flexible orchestration of fault injections across various attack tasks.
  5. Step 5: Start the Drill Task: Start the drill task to automatically inject faults based on the task settings.

Preparations

  1. You have signed up for a HUAWEI ID and completed real-name authentication.

    Before using COC, sign up for a HUAWEI ID, enable Huawei Cloud services, and complete real-name authentication.

    If you have already enabled Huawei Cloud services and completed real-name authentication, skip this step.

  2. You have enabled COC.

    Upon your first login, enable COC first.

    If you have enabled COC, skip this step.

  3. You have purchased a Fault Drill package. For details about the package, see Billing.

    If you have purchased a package, skip this step.

Step 1: Synchronize Resources

  1. Log in to COC.
  2. In the navigation pane, choose Resources > Application and Resource Management.
  3. On the Resources tab page, go to the tag page of the resource type you want to synchronize, for example, Elastic Cloud Server (ECS).
  4. Above the resource list, click Synchronize Resource.
    The system obtains resource data of all regions where the current user belongs and synchronizes the data to COC.
    Figure 1 Synchronizing resources

Step 2: Install UniAgent

If UniAgent has been installed on the target instance, skip this step.

  1. On the Resources > Elastic Cloud Server (ECS) tab page, select the target ECSs and click UniAgent > Install. The UniAgent installation page is displayed.
    When UniAgent is installed in a VPC for the first time, manually install it and set a host with UniAgent installed as the installation host. For details, see How Do I Install UniAgent for the First Time?
    Figure 2 Installing UniAgent
  2. Set parameters for installing UniAgent by referring to Table 1.
    Table 1 Parameters for installing UniAgent

    Parameter

    Example Value

    Description

    UniAgent Version

    1.1.9.8

    Select a UniAgent version from the drop-down list.

    Host Access Mode

    Direct access (private network)

    There are three access modes: Direct access (private network), Direct access (public network), and Proxy access.

    • Direct access (private network): intended for Huawei Cloud hosts.
    • Direct access (public network): intended for non-Huawei Cloud hosts.
    • Proxy access: Select a proxy area where a proxy has been configured and remotely install UniAgent on a host through the proxy.

    Installation Host

    -

    Select an installation host from the drop-down list.

    Select a host where UniAgent has been installed. The installation host will assist in installing UniAgent on other hosts in the same VPC.

    Host About to Accommodate UniAgent

    -

    Specify details of the host where UniAgent needs to be installed.

    • Login Account: account for logging in to the host. Use user root on Linux OSs to obtain sufficient read and write permissions.
    • Login Port: port for accessing the host.
    • Password: password for logging in to the host.
    Figure 3 Installing UniAgent
  3. Click OK and wait until the installation is complete.

Step 3: Create an Application

If you have created an application and associated it with resources, skip this step.

  1. In the navigation pane, choose Resources > Application and Resource Management.
  2. Click the Applications tab.
  3. Click Create Application.
  4. Configure the application structure type.
    Figure 4 Configuring the application structure type
    Table 2 Parameters for configuring the application structure type

    Parameter

    Example Value

    Description

    Application Structure Type

    Lightweight application

    Select a value based on the complexity of the application structure.

  5. Configure the application structure by referring to Table 3.
    Figure 5 Application structure configurations
    Table 3 Parameters for configuring the application structure

    Parameter

    Example Value

    Description

    Application

    test-application

    Specify an application name based on the naming rule. Click OK.

    Component

    test-component

    Specify the component name based on the naming rule. Click OK.

    Group

    test-group

    Specify a group name based on the naming rule.

    Cloud Service Provider

    Huawei Cloud

    Select the cloud service provider to which the target instance belongs.

    Region

    CN North-Beijing4

    Select the region in which the target instance is located.

    Resource Association Method

    Manual association

    Select a resource association method.

    Associate with Resource

    -

    Select the target instance to execute the chaos drill. Click OK.

  6. Click OK. The application is created.

Step 4: Create a Drill Task

  1. In the navigation pane, choose Resilience Center > Chaos Drills.
  2. Click the Drill Tasks tab.
  3. Click Create Task.
  4. Set the basic information about the drill task.
    Figure 6 Configuring basic information
    Table 4 Parameters for configuring basic information

    Parameter

    Example Value

    Description

    Drill Task

    test-drill

    Specify the drill task name based on the naming rules.

    Expected Recovery Duration (Minutes)

    3

    Expected time from fault occurrence to fault recovery

  5. Click Create Attack Task. The Create Attack Task drawer is displayed.
  6. Select an attack target and click Next. The page for selecting an attack scenario is displayed.
    Figure 7 Selecting an attack target
    Table 5 Parameters for selecting an attack target

    Parameter

    Example Value

    Description

    Cloud Service Provider

    Huawei Cloud

    Select a cloud vendor type.

    Source of Attack Target

    Elastic Cloud Server (ECS)

    Select the source of the target instance.

    Attack Task

    test-attacktask

    Specify the name of the attack task based on the naming rule.

    Attack Target

    Select the resources you have associated with the application created in Step 3: Create an Application.

    Select a target instance.

  7. Select an attack scenario based on Table 6 and click Next. The monitoring task configuration page is displayed.
    Figure 8 Selecting an attack scenario
    Table 6 Parameters for selecting an attack scenario

    Parameter

    Example Value

    Description

    Attack Type

    Host Resource

    Attack scenarios are classified based on attack scenario types.

    Attack Scenario

    Increased CPU Usage

    Specify the name of the attack task based on the naming rule.

    Attack Parameters

    • CPU Usage (%): 80
    • Fault Duration (s): 60

    Configure attack parameters based on attack scenarios.

  8. Configure a monitoring task on the monitoring task configuration page.
    • Select CPU Usage as the steady-state metric. The threshold ranges from 1 to 96.
    • Select CPU Usage as the monitoring metric. The threshold ranges from 0 to 60.
  9. Click Finish.
  10. On the page for creating a chaos drill task, click OK.

Step 5: Start the Drill Task

  1. In the navigation pane, choose Resilience Center > Chaos Drills.
  2. Click the Drill Tasks tab.
  3. In the task list, locate the drill task created in Step 4: Create a Drill Task and click Start Drill in the Operation column.
    Figure 9 Starting a drill task
  4. After understanding the risks, click OK.

    After the drill is started, the drill details page is displayed. The chaos drill platform automatically performs fault injection based on the drill task settings.

    You can view the attack progress and details on the drill details page.

    Figure 10 Viewing the drill details