Updated on 2024-09-23 GMT+08:00

ALM-12016 CPU Usage Exceeds the Threshold

Description

The system checks the CPU usage every 30 seconds and compares the actual CPU usage with the threshold. The CPU usage has a default threshold. This alarm is generated when the CPU usage exceeds the threshold for several times (configurable, 10 times by default) consecutively.

The alarm is cleared in the following two scenarios: The value of Trigger Count is 1 and the CPU usage is smaller than or equal to the threshold; the value of Trigger Count is greater than 1 and the CPU usage is smaller than or equal to 90% of the threshold.

Attribute

Alarm ID

Alarm Severity

Auto Clear

12016

Major

Yes

Parameters

Name

Meaning

Source

Specifies the cluster or system for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Trigger Condition

Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.

Impact on the System

  • Latency: If the CPU usage of a host is too high, service processes may run slowly and services may be delayed.
  • Service failure: If the host CPU usage is too high, service processing may slow down, time out, or fail. As a result, jobs may fail to run.

Possible Causes

  • The alarm threshold or alarm smoothing times are incorrect.
  • CPU configuration cannot meet service requirements. The CPU usage reaches the upper limit. Or the service is in peak hours. As a result, the CPU usage reaches the upper limit in a short period of time.

Procedure

Check whether the alarm threshold or alarm Trigger Count are correct.

  1. Change the alarm threshold and alarm Trigger Count based on CPU usage.

    On FusionInsight Manager, choose O&M > Alarm > Thresholds > Name of the desired cluster > Host > CPU > Host CPU Usage and change the alarm smoothing times based on CPU usage, as shown in Figure 1.

    This option defines the alarm check phase. Trigger Count indicates the alarm check threshold. An alarm is generated when the number of check times exceeds the threshold.

    Figure 1 Setting alarm smoothing times

    On Host CPU Usage page and click Modify in the Operation column to change the alarm threshold, as shown in Figure 2.

    Figure 2 Setting an alarm threshold

  2. After 2 minutes, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 3.

Check whether the CPU usage reaches the upper limit.

  1. In the alarm list on FusionInsight Manager, click in the row where the alarm is located to view the alarm host address in the alarm details.
  2. On the Hosts page, click the node on which the alarm is reported.
  3. View the CPU usage for 5 minutes. If the CPU usage exceeds the threshold for multiple times, contact the system administrator to add more CPUs.
  4. Check whether the current traffic is in peak hours. If the alarm is generated during peak hours, you are advised to expand the capacity of the node or contact the system administrator to add more CPUs.
  5. Check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 8.

Collect fault information.

  1. On the FusionInsight Manager in the active cluster, choose O&M > Log > Download.
  2. Select OmmServer from the Service and click OK.
  3. Set Start Date for log collection to 10 minutes ahead of the alarm generation time and End Date to 10 minutes behind the alarm generation time in Time Range and click Download.
  4. Contact the O&M personnel and send the collected log information.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Related Information

None