Drill Template Description

In this section, you will find a standard drill template library covering multiple scenarios, including 12 types of core templates, such as emergency handling, process deduction, and contingency plan practice.

All templates are designed based on industry best practices and have complete structure and reusable content. There are standard frameworks such as drill background, process nodes, roles and responsibilities. You can modify scenario parameters, risk elements, and handling procedures based on actual requirements. The instructions and error-prone prompts can help you customize your drill tasks efficiently using these templates.

**Table 1** Drill template description
Template Name	Description	Label	Level	Task Group Name	Attack Scenario
Cross-AZ DR	This drill simulates how a DR failover is performed for the target service and its antecedent middleware when an AZ is faulty or the network is abnormal in the DR deployment architecture.	DR	Advanced	Cross-AZ DR	Server disconnection
Cross-AZ DR		DR	Advanced	Cross-AZ DR	Powering off a DCS AZ
Initial Chaos Drill	This is essential for beginners to experience the chaos drill process.	Nodes	Basic	Initial Chaos Drill	Qualifying practice
High System Resource Usage	This drill specifies the system resource usage to test the service performance in high pressure scenarios. When host resources are insufficient, you can handle the problem in advance.	Nodes	Medium	Disk Stress	Disk usage increase
				Memory Stress	Memory usage increase
				CPU Stress	CPU usage increase
HPA Configuration in Kubernetes	In the cloud native architecture, auto scaling is an important feature. This drill simulates scale-up after pod resource usage (such as memory) increases in a short period of time and scale-down after resource usage decreases.	Containers and clusters	Advanced	HPA Configuration in Kubernetes	Pod memory usage increase
Data Storage Exception	Generally, service records are stored on the host or middleware where the service is located. Logs are stored on the disk of the host, and data is stored on the middleware such as DDS. This drill simulates the scenario where the ECS disk I/O is high and the primary/standby switchover is performed.	Services and data	Medium	Data Storage Exception	Disk I/O pressure increase
Data Storage Exception		Services and data	Medium	Data Storage Exception	Forcibly promoting a standby node to primary
Automatic Pod Recovery and Scheduling	Kubernetes schedules workloads based on pods. When workloads are generated, the scheduler automatically allocates pods in the workloads. For example, the scheduler distributes pods to nodes that have enough resources.	Clusters	Medium	Automatic Pod Recovery and Scheduling	Memory usage increase
Automatic Pod Recovery and Scheduling		Clusters	Medium	Automatic Pod Recovery and Scheduling	Forcible pod stopping
Network Instability Affecting Service Performance	This drill injects a network delay to the NIC of the service host to simulate the impact on services when the network is unstable.	Networks	Medium	Network Instability Affecting Service Performance	Network latency
Environment Overload in the Microservice Architecture	Microservices are the mainstream architecture. The core value of microservices is to shorten the service release period and ensure reliable system operation. However, microservices also bring many challenges, such as how to locate and rectify faults in the microservice architecture. This drill simulates overloaded nodes of multiple microservices for your reference.	DR	Medium	Environment Overload in the Microservice Architecture	CPU usage increase
					Connection exhaustion
					Process killing
Abnormal Server Power-off	This drill simulates whether services can be recovered with no data loss after a server is powered off. In this drill, you can use the corresponding preset contingency plan to recover services after a node is powered off.	Services and data	Medium	Abnormal Server Power-off	Device shutdown
Data Loss in Service Middleware Cache	In large-scale concurrent data query scenarios where high data query efficiency is required, Redis has become an essential service for internet applications due to its significant speed advantages over traditional databases. However, it may face issues related to data consistency and reliability. This chaos drill aims to verify whether service operations remain normal after clearing Redis data.	DR	Medium	Data Loss in Service Middleware Cache	DCS instance restart
Misoperations in the Host Configuration File	It is a high risk for O&M personnel to directly perform black screen operations on the service host. If the permission of the service configuration file is directly modified, the service process may not be able to read or write the file. This chaos drill uses a custom script to perform operations (modifying or removing permissions) on the host configuration file. You can use the prepared contingency plan to recover the service.	Services and data	Medium	Misoperations in the Host Configuration File	Custom scripts
Automatic Workload Switchover	FlexusL instances are new-generation out-of-the-box lightweight application cloud servers designed for developers and small- and medium-sized enterprises. You can deploy databases or service applications on FlexusL instances. This drill simulates service workload switchover when processes disappear and database nodes are disconnected.	Networks	Advanced	Automatic Workload Switchover	Process killing
Automatic Workload Switchover		Networks	Advanced	Automatic Workload Switchover	Network disconnection