Help Center> GaussDB(DWS)> Management Guide> Monitoring and Alarms> Alarms> Alarm Handling> DWS_2000000001 Node CPU Usage Exceeds the Threshold
Updated on 2024-03-14 GMT+08:00

DWS_2000000001 Node CPU Usage Exceeds the Threshold

Description

GaussDB(DWS) collects the CPU usage of each node in a cluster every 30 seconds. If the average CPU usage of a node in the last 10 minutes (configurable) exceeds 90% (configurable), an alarm is reported indicating that the node CPU usage exceeds the threshold. If the average usage is lower than 85% (that is, the reporting threshold minus 5%), the alarm is cleared.

If the average CPU usage of a node is always greater than the alarm threshold, the alarm is generated again 24 hours (configurable).

Attributes

Alarm ID

Alarm Severity

Auto Clear

DWS_2000000001

Critical

Yes

Parameters

Parameter

Description

Source

Indicates the name of the system for which the alarm is generated, for example, GaussDB(DWS).

Cluster Name

Indicates the cluster for which the alarm is generated.

Location Information

Includes ID and name of the cluster for which the alarm is generated, and ID and name of the instance for which the alarm is generated, for example, cluster_id: xxxx-xxxx-xxxx-xxxx, cluster_name: test_dws, instance_id: xxxx-xxxx-xxxx-xxxx, instance_name: test_dws-dws-cn-cn-1-1.

Detail Information

Detailed information about the alarm, including the cluster, instance, and threshold information. Example: CloudService=DWS, resourceId= xxxx-xxxx-xxxx-xxxx, resourceIdName=test_dws, instance_id: xxxx-xxxx-xxxx-xxxx, instance_name: test_dws-dws-cn-cn-1-1, host_name: host-192-168-1-122, first_alarm_time: 2022-01-30 10:30:00; The average CPU usage of the node within 10 minutes is 90.54%, which exceeds the threshold 90%.

Generated

Time when an alarm is generated.

Status

Indicates the status of the current alarm.

Impact on the System

If the CPU usage is high for a long time, service processes may respond slowly or become unavailable.

Possible Causes

  • Complex services occupy a large number of CPU resources.
  • The CPU configuration of the cluster is too low to meet service requirements.

Handling Procedure

  1. Check the CPU usage of each node.

    1. Log in to the GaussDB(DWS) console.
    2. On the Alarms page, in the cluster selection drop-down list in the upper right corner, select the cluster for which the alarm is generated, view the alarm information of the cluster in the last seven days, and locate the name of the node for which the alarm is generated based on the location information.

    3. On the Cluster > Dedicated Cluster page, locate the row that contains the cluster for which the alarm is generated and click Monitoring Panel in the Operation column.

    4. Choose Monitoring > Node Monitoring > Overview to view the CPU usage of each node in the current cluster. Click on the right to view the CPU performance metrics in the last 1, 3, 12, or 24 hours and see whether there is a sharp increase in the CPU usage.
      • If the CPU usage frequently increases and then returns to normal in a short period of time, it indicates that the CPU usage temporarily spikes during service execution. In this case, you can adjust the alarm threshold through 2 to reduce the number of reported alarms.
      • If the CPU usage remains high for a long time, it indicates that the cluster is overloaded. In this case, check cluster services by referring to 3 or enhance the cluster flavor. For details, see Changing the Node Flavor.

  2. Check whether the CPU usage alarm configuration is proper.

    1. Choose Alarms > Alarm Rules.

    2. Locate the row that contains the Node CPU Usage Exceeds the Threshold, and click Modify in the Operation column. The Modifying an Alarm Rule page is displayed.

    3. Adjust the alarm threshold and detection period. A higher alarm threshold and a longer detection period indicate a lower alarm sensitivity. For details about the GUI configuration, see Alarm Rules.

  3. Check whether the CPU usage of the current cluster service is too high.

    1. On the monitoring page, choose Monitoring > Queries, click , and select CPU Time (ms) to view the query with the longest CPU time.
    2. After confirming with the service side, select the query ID to be stopped and click Stop Query.

Alarm Clearance

After the CPU usage decreases, the alarm is automatically cleared.