Updated on 2025-08-08 GMT+08:00

Creating and Managing Drill Tasks

Scenarios

Drill tasks allow you to simulate software or hardware faults to test the system's fault recovery capability. Drill task operations include managing chaos drill tasks, viewing drill records, and creating drill tasks. Setting a drill task include setting the basic information, adding an attack task group, selecting an attack task, and selecting an attack scenario. In addition, a drill task involves monitoring task configuration and post-drill review and improvement. This ensures that an excellent optimization policy can be applied when the system is under various pressures.

Automatic Task Termination Mechanism

  • Automatic termination upon timeout: If a drill task fails and you do not manually close the task within 48 hours, the system automatically terminates the drill task.
  • Automatic termination upon exceptions: During the drill, if a pod exception (for example, the pod has been deleted) is detected or a resource O&M ticket is manually closed, the system automatically terminates the current task immediately.

Creating a Drill Task

  1. Log in to COC.
  2. In the navigation tree on the left, choose Resilience Center > Chaos Drills.
  3. Click the Drill Tasks tab.
  4. Click Create Task.

    You can also use the drill plan ticket accepting function to access the page for creating a drill task. For details, see Creating and Managing Drill Plans.

  5. Configure the basic information.

    Table 1 Parameters in the basic information

    Parameter

    Description

    Example Value

    Drill Task

    Name of the drill task. Set it according to the naming rules.

    test-drill

    Expected Recovery Duration (Minutes)

    Expected time from the fault occurrence to the fault recovery, in minutes.

    Expected time for an application to automatically recover to the normal state during emergency plan execution after a fault is injected. This time does not affect the drill task.

    3

  6. Click Add Attack Task.

    By default, there is one attack task group. You can click Add Task Group to add a task group. After adding an attack task, you can click Add Attack Task to add another attack task.
    • Tasks in different task groups are executed in serial mode, and tasks in the same task group are executed in parallel mode.
    • Currently, multiple fault injection operations on the same resource in a task group are not supported.
    1. Set parameters for adding an attack task.
      • To add an existing task, click Select from Existing, select the existing task, and click OK.
      • To add a new attack task, perform the follow-up steps.
        Table 2 Parameters for adding an attack task

        Parameter

        Description

        Example Value

        Vendor

        Select a cloud vendor type.

        Huawei Cloud

        Source of Attack Target

        Select the source of the target instance.

        You can select attack targets by selecting instances, pods, or a specified number of targets if CCE instances are used.

        Elastic Cloud Server (ECS)

        Attack Task

        Customize the name of the attack task based on the naming rule.

        test-attacktask

        Attack Target

        Select the target instance.

        -

    2. Click Next.
    3. Set parameters for selecting an attack scenario.
      For details, see Attack Scenarios.
      Table 3 Parameters for selecting an attack scenario

      Parameter

      Description

      Example Value

      Attack Type

      Attack scenarios are classified based on attack scenario types.

      Host Resource

      Attack Scenario

      Select an attack scenario.

      CPU usage increase

      Attack Parameters

      Configure attack parameters based on attack scenarios.

      CPU Usage (%): 80

      Fault Duration (s): 60

    4. Click Next.
    5. (Optional) Set Configure Monitoring Tasks.
      Table 4 Parameters for configuring a monitoring task

      Parameter

      Description

      Steady-State Metrics

      Select the target resource, performance metric, lower limit, and upper limit from the drop-down lists one by one.

      If a service can perform well and stably when a performance monitoring metric is set to a certain value range, this metric is called stable-status metric. If this metric value is not in that value range before a drill, the drill will be canceled.

      Metric

      Select the target resource, monitoring metric, lower limit, and upper limit from the drop-down lists one by one.

      These service metrics monitor the corresponding service data during fault drills. If the value of such a metric is within the allowed value range, the service is normal. Otherwise, you can determine whether to stop a drill.

      Automatic Rollback

      Select whether to enable automatic rollback.

      Fault injection is automatically rolled back and restored to the status before fault injection. Automatic rollback cannot be configured for some disruptors for fault drills that do not support fault termination.

      If the value of a steady-state metric is not within the stable value range during a drill, the corresponding fault injection automatically stops after automatic rollback is enabled.

    6. Click Finish. The attack task is added.

  7. Click OK.

Modifying a Drill Task

Modify the created drill task. If a drill record has been generated for the drill task, the drill task cannot be modified.

  1. Log in to COC.
  2. In the navigation tree on the left, choose Resilience Center > Chaos Drills.
  3. Click the Drill Tasks tab.
  4. Locate the drill task you want to modify and click More in the Operation column and choose Modify.
  5. Modify the drill task based on the requirement scenario.

    • Click Add Task Group.
    • Click Add Attack Task.
    • Click Delete in the row of a task to delete the attack task.

  6. Click OK.

    The drill task is modified.

Deleting a Drill Task

Delete a created drill task. If a drill record has been generated for the drill task, the drill task cannot be deleted. If a drill plan is associated with the drill task, the drill task cannot be deleted.

  1. Log in to COC.
  2. In the navigation tree on the left, choose Resilience Center > Chaos Drills.
  3. Click the Drill Tasks tab.
  4. Locate the drill task you want to delete and click More in the Operation column and choose Delete.
  5. Click OK.

    The drill task is deleted.

Starting a Drill Task

Start a drill task.

  1. Log in to COC.
  2. In the navigation tree on the left, choose Resilience Center > Chaos Drills.
  3. Click the Drill Tasks tab.
  4. Locate the drill task you want to start and click Start in the Operation column.
  5. Click OK.

    The drill starts. On the drill details page, you can view the attack progress, including installing probes, performing drills, and clearing the environment. The system automatically performs the drill task. The execution time depends on the attack time of the disruptor.

    In the probe installation step, a probe will be installed on the target machine. The probe runs in the system to receive disruptor commands for attack, query, and clearance. After the drill is complete or terminated, the environment clearing step stops all operations in the system and is removed.

  6. For drill execution, the following operations are supported:

    • Terminate: During a drill, click Terminate in the upper right corner to stop the task to be executed or the task that is abnormal.
    • Retry: If some or all attack tasks fail to check instances, install probes, clear environments, or perform steady-state detection, or the drill times out, expand the failed attack task and click Retry to retry the task.
    • Skip: If some or all attack tasks fail to be executed during the drill, expand a failed attack task and click Skip to skip the task and execute the next task.
    • Details: Expand an attack task and click Details to view the attack details.

    Description of the drill details page:

    • The drill record module displays attack task details, including the attack task progress, task information, and execution time.
    • The attack details module displays the attack status of instances in the application of the current task. BMSs, FlexusL (HCSS) instances, and CSS instances are not supported.
    • The monitoring details module displays real-time monitoring data of attack targets. You need to configure a drill monitoring task when creating an attack task.

Viewing Drill Records

View the drill records of a drill task. A drill task that has not been drilled does not contain drill record.

  1. Log in to COC.
  2. In the navigation tree on the left, choose Resilience Center > Chaos Drills.
  3. Click the Drill Tasks tab.
  4. Locate the target drill task and click Drill Record in the Operation column.

    The basic information about the drill task includes the drill task name, drill task ID, attack details, and failure mode. All drill records include the drill record ID, execution status, executor, drill start time, and drill end time.

  5. Locate the drill record to be viewed and click View Progress in the Operation column.

    View the attack progress and attack details of the current drill task.

  6. Click Drill Report on the right.

    Create or view a drill report. For details, see Creating a Drill Report.