DWS_2000000009 Node Data Disk I/O Usage Exceeds the Threshold
Description
GaussDB(DWS) collects the data disk I/O usage of each cluster node every 30 seconds. This alarm is generated when the average usage of a data disk on a node exceeds 90% (configurable) in the last 10 minutes (configurable), and is automatically cleared when the average usage drops below 85% (alarm threshold minus 5%).
![](https://support.huaweicloud.com/eu/mgtg-dws/public_sys-resources/note_3.0-en-us.png)
If the data disk I/O usage of a node is always greater than the alarm threshold, the alarm is generated again 24 hours later (configurable).
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
DWS_2000000009 |
Critical |
Yes |
Alarm Parameters
Parameter |
Description |
---|---|
Alarm Source |
Indicates the name of the system for which the alarm is generated, for example, GaussDB(DWS). |
Cluster Name |
Indicates the cluster for which the alarm is generated. |
Location Information |
Includes ID and name of the cluster for which the alarm is generated, and ID and name of the instance for which the alarm is generated, for example, cluster_id: xxxx-xxxx-xxxx-xxxx, cluster_name: test_dws, instance_id: xxxx-xxxx-xxxx-xxxx, instance_name: test_dws-dws-cn-cn-1-1. |
Detail Information |
Detailed information about the alarm, including the cluster, instance, disk, and threshold information. Example: CloudService=DWS, resourceId= xxxx-xxxx-xxxx-xxxx, resourceIdName=test_dws, instance_id: xxxx-xxxx-xxxx-xxxx, instance_name: test_dws-dws-cn-cn-1-1, host_name: host-192-168-1-122, disk_name: /dev/vdb, first_alarm_time: 2022-01-30 10:30:00; The log disk I/O usage of the node within 10 minutes is 90.54%, exceeding the threshold 90%. |
Generated |
Time when an alarm is generated. |
Status |
Indicates the status of the current alarm. |
Impact on the System
- High disk I/O usage affects data read and write performance, thereby affecting cluster performance.
- A large number of disk writes occupy the disk capacity. If the disk capacity exceeds 90%, the cluster becomes read-only.
Possible Causes
- A large number of read or write operations are performed during peak hours.
- A large amount of data spills to disks due to the execution of complex statements.
- Data is scanned by the Scan operator.
Handling Procedure
- On the Clusters > Dedicated Clusters page, locate the row that contains the target cluster and click Monitoring in the Operation column.
- In the navigation pane on the left, choose Monitoring > Node Monitoring. On the Node Monitoring page, view the data disk I/O usage and disk I/O rate.
If the disk I/O rate is high and the data disk usage keeps increasing, it indicates that services are writing data to disks. This may be caused by complex queries.
- Click Queries in the navigation tree on the left to view the real-time queries.
If the execution time of a statement exceeds the expected time, stop the query and check the disk I/O usage again. For details, see 2.
Alarm Clearance
This alarm is automatically cleared when the data disk I/O usage drops to a certain value.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.