Updated on 2024-01-17 GMT+08:00

ALM-12027 Host PID Usage Exceeds the Threshold (For MRS 2.x or Earlier)

Description

The system checks the PID usage every 30 seconds and compares the actual PID usage with the default threshold. This alarm is generated when the PID usage exceeds the threshold.

This alarm is cleared when the host PID usage is less than or equal to the threshold.

Attribute

Alarm ID

Alarm Severity

Auto Clear

12027

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Trigger Condition

Generates an alarm when the actual indicator value exceeds the specified threshold.

Impact on the System

No PID is available for new processes and service processes are unavailable.

Possible Causes

Too many processes are running on the node. You need to increase the value of pid_max. The system is abnormal.

Procedure

  1. Increase the value of pid_max.

    1. On the MRS cluster details page, click the alarm from the real-time alarm list. In the Alarm Details area, obtain the IP address of the host for which the alarm is generated.
    2. Log in to the node for which the alarm is generated.
    3. Run the cat /proc/sys/kernel/pid_max command to check the value of pid_max.
    4. If the PID usage exceeds the threshold, open the /etc/sysctl.conf file and change the value of kernel.pid_max to twice the value of pid_max queried in 1.c. If kernel.pid_max does not exist, add it to the end of the file.

      For example, change the parameter value to kernel.pid_max=65536 and run the following command to make the parameter take effect immediately:

      sysctl -p

      The maximum value of kernel.pid_max is as follows:

      • 32-bit OS: 32768
      • 64-bit OS: 4194304 (22nd power of 2)
    5. Wait 5 minutes and check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to 2.

  2. Check whether the system environment is abnormal.

    1. Contact the O&M personnel to check whether the operating system is abnormal.
      • If yes, rectify the operating system fault and go to 2.b.
      • If no, go to 3.
    2. Wait 5 minutes and check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to 3.

  3. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Reference

None