Help Center> Data Warehouse Service (DWS)> Management Guide> Monitoring and Alarms> Alarms> Alarm Handling> DWS_2000000009 Node Data Disk I/O Usage Exceeds the Threshold
Updated on 2024-06-14 GMT+08:00

DWS_2000000009 Node Data Disk I/O Usage Exceeds the Threshold

Description

GaussDB(DWS) collects the data disk I/O usage of each cluster node every 30 seconds. This alarm is generated when the average usage of a data disk on a node exceeds 90% (configurable) in the last 10 minutes (configurable), and is automatically cleared when the average usage drops below 85% (alarm threshold minus 5%).

If the data disk I/O usage of a node is always greater than the alarm threshold, the alarm is generated again 24 hours later (configurable).

Alarm Attributes

Alarm ID

Alarm Severity

Auto Clear

DWS_2000000009

Critical

Yes

Alarm Parameters

Parameter

Description

Alarm Source

Indicates the name of the system for which the alarm is generated, for example, GaussDB(DWS).

Cluster Name

Indicates the cluster for which the alarm is generated.

Location Information

Includes ID and name of the cluster for which the alarm is generated, and ID and name of the instance for which the alarm is generated, for example, cluster_id: xxxx-xxxx-xxxx-xxxx, cluster_name: test_dws, instance_id: xxxx-xxxx-xxxx-xxxx, instance_name: test_dws-dws-cn-cn-1-1.

Detail Information

Detailed information about the alarm, including the cluster, instance, disk, and threshold information. Example: CloudService=DWS, resourceId= xxxx-xxxx-xxxx-xxxx, resourceIdName=test_dws, instance_id: xxxx-xxxx-xxxx-xxxx, instance_name: test_dws-dws-cn-cn-1-1, host_name: host-192-168-1-122, disk_name: /dev/vdb, first_alarm_time: 2022-01-30 10:30:00; The log disk I/O usage of the node within 10 minutes is 90.54%, exceeding the threshold 90%.

Generated

Time when an alarm is generated.

Status

Indicates the status of the current alarm.

Impact on the System

  • High disk I/O usage affects data read and write performance, thereby affecting cluster performance.
  • A large number of disk writes occupy the disk capacity. If the disk capacity exceeds 90%, the cluster becomes read-only.

Possible Causes

  • A large number of read or write operations are performed during peak hours.
  • A large amount of data spills to disks due to the execution of complex statements.
  • Data is scanned by the Scan operator.

Handling Procedure

  1. On the Clusters > Dedicated Clusters page, locate the row that contains the target cluster and click Monitoring in the Operation column.
  2. In the navigation pane on the left, choose Monitoring > Node Monitoring. On the Node Monitoring page, click the Disks tab to view the data disk I/O usage and disk I/O rate.

    If the disk I/O rate is high and the data disk usage keeps increasing, it indicates that services are writing data to disks. This may be caused by complex queries.

  3. Click Queries in the navigation tree on the left to view the real-time queries.

    If the execution time of a statement exceeds the expected time, stop the query and check the disk I/O usage again. For details, see 2.

Alarm Clearance

This alarm is automatically cleared when the data disk I/O usage drops to a certain value.