DWS_2000000009 Node Data Disk I/O Usage Exceeds the Threshold

Description

DWS collects the data disk I/O usage of each cluster node every 30 seconds. This alarm is generated when the average usage of a data disk on a node exceeds 90% (configurable) in the last 10 minutes (configurable), and is automatically cleared when the average usage drops below 85% (alarm threshold minus 5%).

If the data disk I/O usage of a node is always greater than the alarm threshold, the alarm is generated again 24 hours later (configurable).
When using an SSD storage-based cluster, disk I/O can surpass 100% as the service volume grows. But, this does not always mean there is a performance bottleneck. To confirm the alarm's validity, you should evaluate the service's actual running status.

Alarm Attributes

Alarm ID	Alarm Category	Alarm Severity	Alarm Type	Service Type	Auto Cleared
DWS_2000000009	Tenant plane	Urgent: > 90%	Operation alarm	DWS	Yes

Alarm Parameters

Category	Name	Description
Location information	Name	Node Data Disk I/O Usage Exceeds the Threshold
	Type	Operation alarm
	Generation time	Time when the alarm is generated
Other information	Cluster ID	Cluster details such as resourceId and domain_id

Impact on the System

High disk I/O usage affects data read and write performance, thereby affecting cluster performance.
A large number of disk writes occupy the disk capacity. If the disk capacity exceeds 90%, the cluster becomes read-only.

Possible Causes

A large number of read or write operations are performed during peak hours.
A large amount of data spills to disks due to the execution of complex statements.
Data is scanned by the Scan operator.

Handling Procedure

Choose Dedicated Clusters > Clusters, locate the row that contains the target cluster, and click Monitoring in the Operation column.
In the navigation pane on the left, choose Monitoring > Node Monitoring. On the Node Monitoring page, click the Disks tab to view the data disk I/O usage and disk I/O rate.

If the disk I/O rate is high and the data disk usage keeps increasing, it indicates that services are writing data to disks. This may be caused by complex queries.
Click Queries in the navigation pane to view the real-time queries.

If the execution time of a statement exceeds the expected time, stop the query and check the disk I/O usage again. For details, see 2.

Figure 1 Terminating a query