Updated on 2025-08-08 GMT+08:00

Drill Template Description

In this section, you will find a standard drill template library covering multiple scenarios, including 12 types of core templates, such as emergency handling, process deduction, and contingency plan practice.

All templates are designed based on industry best practices and have complete structure and reusable content. There are standard frameworks such as drill background, process nodes, roles and responsibilities. You can modify scenario parameters, risk elements, and handling procedures based on actual requirements. The instructions and error-prone prompts can help you customize your drill tasks efficiently using these templates.

Table 1 Drill template description

Template Name

Description

Label

Level

Task Group Name

Attack Scenario

Cross-AZ DR

This drill simulates how a DR failover is performed for the target service and its antecedent middleware when an AZ is faulty or the network is abnormal in the DR deployment architecture.

DR

Advanced

Cross-AZ DR

Server disconnection

Powering off a DCS AZ

Initial Chaos Drill

This is essential for beginners to experience the chaos drill process.

Nodes

Basic

Initial Chaos Drill

Qualifying practice

High System Resource Usage

This drill specifies the system resource usage to test the service performance in high pressure scenarios. When host resources are insufficient, you can handle the problem in advance.

Nodes

Medium

Disk Stress

Disk usage increase

Memory Stress

Memory usage increase

CPU Stress

CPU usage increase

HPA Configuration in Kubernetes

In the cloud native architecture, auto scaling is an important feature. This drill simulates scale-up after pod resource usage (such as memory) increases in a short period of time and scale-down after resource usage decreases.

Containers and clusters

Advanced

HPA Configuration in Kubernetes

Pod memory usage increase

Data Storage Exception

Generally, service records are stored on the host or middleware where the service is located. Logs are stored on the disk of the host, and data is stored on the middleware such as DDS. This drill simulates the scenario where the ECS disk I/O is high and the primary/standby switchover is performed.

Services and data

Medium

Data Storage Exception

Disk I/O pressure increase

Forcibly promoting a standby node to primary

Automatic Pod Recovery and Scheduling

Kubernetes schedules workloads based on pods. When workloads are generated, the scheduler automatically allocates pods in the workloads. For example, the scheduler distributes pods to nodes that have enough resources.

Clusters

Medium

Automatic Pod Recovery and Scheduling

Memory usage increase

Forcible pod stopping

Network Instability Affecting Service Performance

This drill injects a network delay to the NIC of the service host to simulate the impact on services when the network is unstable.

Networks

Medium

Network Instability Affecting Service Performance

Network latency

Environment Overload in the Microservice Architecture

Microservices are the mainstream architecture. The core value of microservices is to shorten the service release period and ensure reliable system operation. However, microservices also bring many challenges, such as how to locate and rectify faults in the microservice architecture. This drill simulates overloaded nodes of multiple microservices for your reference.

DR

Medium

Environment Overload in the Microservice Architecture

CPU usage increase

Connection exhaustion

Process killing

Abnormal Server Power-off

This drill simulates whether services can be recovered with no data loss after a server is powered off. In this drill, you can use the corresponding preset contingency plan to recover services after a node is powered off.

Services and data

Medium

Abnormal Server Power-off

Device shutdown

Data Loss in Service Middleware Cache

In large-scale concurrent data query scenarios where high data query efficiency is required, Redis has become an essential service for internet applications due to its significant speed advantages over traditional databases. However, it may face issues related to data consistency and reliability. This chaos drill aims to verify whether service operations remain normal after clearing Redis data.

DR

Medium

Data Loss in Service Middleware Cache

DCS instance restart

Misoperations in the Host Configuration File

It is a high risk for O&M personnel to directly perform black screen operations on the service host. If the permission of the service configuration file is directly modified, the service process may not be able to read or write the file. This chaos drill uses a custom script to perform operations (modifying or removing permissions) on the host configuration file. You can use the prepared contingency plan to recover the service.

Services and data

Medium

Misoperations in the Host Configuration File

Custom scripts

Automatic Workload Switchover

FlexusL instances are new-generation out-of-the-box lightweight application cloud servers designed for developers and small- and medium-sized enterprises. You can deploy databases or service applications on FlexusL instances. This drill simulates service workload switchover when processes disappear and database nodes are disconnected.

Networks

Advanced

Automatic Workload Switchover

Process killing

Network disconnection