Performing Chaos Drills on High CPU Usage
COC is a secure, efficient, and one-stop intelligent O&M platform that meets customers' requirements for centralized O&M and management of multiple types of resources. The chaos drill function is the core capability module for COC to improve system resilience. It allows you to configure drill and attack templates as required, and perform fault injection drills on different types of infrastructure and application carriers, such as physical machines, VMs, and CCE containers, based on the templates. In addition, the built-in failure mode management capability lets you perform standard fault simulation operations on target instances. This helps check the system's fault tolerance capability and fault rectification efficiency when exceptions arise.
This section uses the typical fault scenario, high CPU usage of ECS, as an example to describe how to configure and perform a chaos drill on COC.
Operation Process
- Step 1: Synchronize Resources: Obtain resource data in all regions to which the current user belongs and synchronize the data to COC.
- Step 2: Install UniAgent: Install UniAgent on each desired node for information exchange between COC, lower-layer services, and hosts.
- Step 3: Create an Application: Manage the relationship between applications and cloud resources, and provide unified and timely resource environment management services for follow-up resource monitoring and automated O&M.
- Step 4: Create a Drill Task: Preset a drill solution for resources and enable flexible orchestration of fault injections across various attack tasks.
- Step 5: Start the Drill Task: Start the drill task to automatically inject faults based on the task settings.
Preparations
- You have signed up for a HUAWEI ID and completed real-name authentication.
Before using COC, sign up for a HUAWEI ID, enable Huawei Cloud services, and complete real-name authentication.
If you have already enabled Huawei Cloud services and completed real-name authentication, skip this step.
- You have enabled COC.
Upon your first login, enable COC first.
If you have enabled COC, skip this step.
- You have purchased a Fault Drill package. For details about the package, see Billing.
If you have purchased a package, skip this step.
Step 1: Synchronize Resources
- Log in to COC.
- In the navigation pane, choose Resources > Application and Resource Management.
- On the Resources tab page, go to the tag page of the resource type you want to synchronize, for example, Elastic Cloud Server (ECS).
- Above the resource list, click Synchronize Resource.
The system obtains resource data of all regions where the current user belongs and synchronizes the data to COC.Figure 1 Synchronizing resources
Step 2: Install UniAgent
If UniAgent has been installed on the target instance, skip this step.
- On the Resources > Elastic Cloud Server (ECS) tab page, select the target ECSs and click UniAgent > Install. The UniAgent installation page is displayed.
When UniAgent is installed in a VPC for the first time, manually install it and set a host with UniAgent installed as the installation host. For details, see How Do I Install UniAgent for the First Time?Figure 2 Installing UniAgent
- Set parameters for installing UniAgent by referring to Table 1.
Table 1 Parameters for installing UniAgent Parameter
Example Value
Description
UniAgent Version
1.1.9.8
Select a UniAgent version from the drop-down list.
Host Access Mode
Direct access (private network)
There are three access modes: Direct access (private network), Direct access (public network), and Proxy access.
- Direct access (private network): intended for Huawei Cloud hosts.
- Direct access (public network): intended for non-Huawei Cloud hosts.
- Proxy access: Select a proxy area where a proxy has been configured and remotely install UniAgent on a host through the proxy.
Installation Host
-
Select an installation host from the drop-down list.
Select a host where UniAgent has been installed. The installation host will assist in installing UniAgent on other hosts in the same VPC.
Host About to Accommodate UniAgent
-
Specify details of the host where UniAgent needs to be installed.
- Login Account: account for logging in to the host. Use user root on Linux OSs to obtain sufficient read and write permissions.
- Login Port: port for accessing the host.
- Password: password for logging in to the host.
Figure 3 Installing UniAgent
- Click OK and wait until the installation is complete.
Step 3: Create an Application
If you have created an application and associated it with resources, skip this step.
- In the navigation pane, choose Resources > Application and Resource Management.
- Click the Applications tab.
- Click Create Application.
- Configure the application structure type.
Figure 4 Configuring the application structure type
Table 2 Parameters for configuring the application structure type Parameter
Example Value
Description
Application Structure Type
Lightweight application
Select a value based on the complexity of the application structure.
- Configure the application structure by referring to Table 3.
Figure 5 Application structure configurations
Table 3 Parameters for configuring the application structure Parameter
Example Value
Description
Application
test-application
Specify an application name based on the naming rule. Click OK.
Component
test-component
Specify the component name based on the naming rule. Click OK.
Group
test-group
Specify a group name based on the naming rule.
Cloud Service Provider
Huawei Cloud
Select the cloud service provider to which the target instance belongs.
Region
CN North-Beijing4
Select the region in which the target instance is located.
Resource Association Method
Manual association
Select a resource association method.
Associate with Resource
-
Select the target instance to execute the chaos drill. Click OK.
- Click OK. The application is created.
Step 4: Create a Drill Task
- In the navigation pane, choose Resilience Center > Chaos Drills.
- Click the Drill Tasks tab.
- Click Create Task.
- Set the basic information about the drill task.
Figure 6 Configuring basic information
Table 4 Parameters for configuring basic information Parameter
Example Value
Description
Drill Task
test-drill
Specify the drill task name based on the naming rules.
Expected Recovery Duration (Minutes)
3
Expected time from fault occurrence to fault recovery
- Click Create Attack Task. The Create Attack Task drawer is displayed.
- Select an attack target and click Next. The page for selecting an attack scenario is displayed.
Figure 7 Selecting an attack target
Table 5 Parameters for selecting an attack target Parameter
Example Value
Description
Cloud Service Provider
Huawei Cloud
Select a cloud vendor type.
Source of Attack Target
Elastic Cloud Server (ECS)
Select the source of the target instance.
Attack Task
test-attacktask
Specify the name of the attack task based on the naming rule.
Attack Target
Select the resources you have associated with the application created in Step 3: Create an Application.
Select a target instance.
- Select an attack scenario based on Table 6 and click Next. The monitoring task configuration page is displayed.
Figure 8 Selecting an attack scenario
Table 6 Parameters for selecting an attack scenario Parameter
Example Value
Description
Attack Type
Host Resource
Attack scenarios are classified based on attack scenario types.
Attack Scenario
Increased CPU Usage
Specify the name of the attack task based on the naming rule.
Attack Parameters
- CPU Usage (%): 80
- Fault Duration (s): 60
Configure attack parameters based on attack scenarios.
- Configure a monitoring task on the monitoring task configuration page.
- Select CPU Usage as the steady-state metric. The threshold ranges from 1 to 96.
- Select CPU Usage as the monitoring metric. The threshold ranges from 0 to 60.
- Click Finish.
- On the page for creating a chaos drill task, click OK.
Step 5: Start the Drill Task
- In the navigation pane, choose Resilience Center > Chaos Drills.
- Click the Drill Tasks tab.
- In the task list, locate the drill task created in Step 4: Create a Drill Task and click Start Drill in the Operation column.
Figure 9 Starting a drill task
- After understanding the risks, click OK.
After the drill is started, the drill details page is displayed. The chaos drill platform automatically performs fault injection based on the drill task settings.
You can view the attack progress and details on the drill details page.
Figure 10 Viewing the drill details
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot