Updated on 2023-03-08 GMT+08:00

Alarm Management

Overview

Alarm management includes viewing and configuring alarm rules and subscribing to alarm information. Alarm rules display alarm statistics and details of the past week for users to view tenant alarms. In addition to providing a set of default GaussDB(DWS) alarm rules, this feature allows you to modify alarm thresholds based on your own services. GaussDB(DWS) alarm notifications are sent using the SMN service.

This feature supports only the database kernel of 8.1.1.200 and later.

Visiting the Alarms Page

  1. Log in to the GaussDB(DWS) management console.
  2. In the navigation pane on the left, click Alarms.
  3. On the page that is displayed:

    • Existing Alarm Statistics

      Statistics of the existing alarms in the past seven days are displayed by alarm severity in a bar chart. In this way, you can see clearly the number and category of the alarms generated in the past week.

    • Today's Alarms

      Statistics of the existing alarms on the current day are displayed by alarm severity in a list. In this way, you can see clearly the number and category of the unhandled alarms generated on the day.

    • Alarm details

      Details about all alarms, handled and unhandled, in the past seven days are displayed in a table for you to quickly locate faults, including the alarm name, alarm severity, cluster name, location, description, generation date, and status.

    The alarm data displayed (a maximum of 30 days) is supported by the Event Service microservice.

Alarm Types and Alarms

Table 1 Threshold alarms of DMS alarm sources

Type

Name

Severity

Description

Default

Node CPU Usage Exceeds the Threshold

Urgent

This alarm is generated if the threshold of CPU usage (system + user) of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the CPU usage (system + user) is lower than the threshold and the constraint is not met.

Default

Node System CPU Usage Exceeds the Threshold

Urgent

This alarm is generated if the threshold of system CPU usage of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the system CPU usage is lower than the threshold and the constraint is not met.

Default

Node Swap Usage Exceeds the Threshold

Urgent

This alarm is generated if the threshold of swap usage of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the swap usage is lower than the threshold and the constraint is not met.

Default

Node System Disk Usage Exceeds the Threshold

Urgent: > 85%; Important: >75%

This alarm is generated if the threshold of system disk (/) usage of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the system disk (/) usage is lower than the threshold and the constraint is not met.

Default

Node Log Disk Usage Exceeds the Threshold

Urgent: > 85%; Important: >75%

This alarm is generated if the threshold of log disk (/var/chroot/DWS/manager) usage of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the log disk (/var/chroot/DWS/manager) usage is lower than the threshold and the constraint is not met.

Default

Node Data Disk Usage Exceeds the Threshold

Urgent: > 85%; Important: >75%

This alarm is generated if the threshold of data disk (/var/chroot/DWS/data[n]) usage of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the data disk (/var/chroot/DWS/data[n]) usage is lower than the threshold and the constraint is not met.

Default

Node System Disk I/O Usage Exceeds the Threshold

Urgent

This alarm is generated if the threshold of system disk (/) I/O usage (util) of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the system disk (/) I/O usage (util) is lower than the threshold and the constraint is not met.

Default

Node Log Disk I/O Usage Exceeds the Threshold

Urgent

This alarm is generated if the threshold of log disk (/var/chroot/DWS/manager) I/O usage (util) of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the log disk (/var/chroot/DWS/manager) I/O usage (util) is lower than the threshold and the constraint is not met.

Default

Node Data Disk I/O Usage Exceeds the Threshold

Urgent

This alarm is generated if the threshold of data disk (/var/chroot/DWS/data[n]) I/O usage (util) of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the data disk (/var/chroot/DWS/data[n]) I/O usage (util) is lower than the threshold and the constraint is not met.

Default

Node System Disk Latency Exceeds the Threshold

Important

This alarm is generated if the threshold of system disk (/) I/O latency (await) of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the system disk (/) I/O latency (await) is lower than the threshold and the constraint is not met.

Default

Node Log Disk Latency Exceeds the Threshold

Important

This alarm is generated if the threshold of log disk (/var/chroot/DWS/manager) I/O latency (await) of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the log disk (/var/chroot/DWS/manager) I/O latency (await) is lower than the threshold and the constraint is not met.

Default

Node Data Disk Latency Exceeds the Threshold

Important

This alarm is generated if the threshold of data disk (/var/chroot/DWS/data[n]) I/O latency (await) of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the data disk (/var/chroot/DWS/data[n]) I/O latency (await) is lower than the threshold and the constraint is not met.

Default

Node System Disk Inode Usage Exceeds the Threshold

Urgent: > 85%; Important: >75%

This alarm is generated if the threshold of system disk (/) inode usage of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the system disk (/) inode usage is lower than the threshold and the constraint is not met.

Default

Node Log Disk Inode Usage Exceeds the Threshold

Urgent: > 85%; Important: >75%

This alarm is generated if the threshold of log disk (/var/chroot/DWS/manager) inode usage of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the log disk (/var/chroot/DWS/manager) inode usage is lower than the threshold and the constraint is not met.

Default

Node Data Disk Inode Usage Exceeds the Threshold

Urgent: > 85%; Important: >75%

This alarm is generated if the threshold of data disk (/var/chroot/DWS/data[n]) inode usage of any node in the cluster is exceeded within the specified period and the constraint is not met. The alarm will be cleared when the data disk (/var/chroot/DWS/data[n]) inode usage is lower than the threshold and the constraint is not met.

Default

Data Flushed to Disks of the Query Statement Exceeds the Threshold

Urgent

This alarm is generated if the threshold of data flushed to disks of the SQL statement in the cluster is exceeded within the specified period and the constraint is not met. The alarm can be cleared only after you handle the SQL statement.

Default

Number of Queuing Query Statements Exceeds the Threshold

Urgent

This alarm is generated if the threshold of the number of queuing SQL statements is exceeded within the specified period. The alarm will be cleared when the number of queuing SQL statements is less than the threshold.

Custom

Name of the user-defined threshold alarm

User-defined alarm severity

Alarm description