Help Center/ MapReduce Service/ User Guide/ Alarm Reference (Applicable to MRS 3.x)/ ALM-12186 CGroup Task Usage Exceeds the Threshold
Updated on 2024-04-11 GMT+08:00

ALM-12186 CGroup Task Usage Exceeds the Threshold

Alarm Description

The system checks the CGroup task usage of user omm every 5 minutes. This alarm is generated when the CGroup task usage exceeds 90%. This alarm is cleared when the CGroup task usage is less than or equal to 90%.

CGroup task usage = Number of used CGroup tasks/Maximum number of CGroup tasks

You can run the systemctl status user-$(id -u).slice | grep limit | awk -F ' ' '{print $2}' command as user omm to obtain the number of used CGroup tasks of this user and run the echo $(systemctl status user-$(id -u).slice | grep limit | awk -F ' ' '{print $4}') | sed -e 's/)//g' command to obtain the maximum number of CGroup tasks allowed for this user.

Alarm Attributes

Alarm ID

Alarm Severity

Auto Cleared

12186

Major

Yes

Alarm Parameters

Parameter

Description

Source

Specifies the cluster or system for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

  • Failed to switch to user omm.
  • Failed to create new omm processes.
  • A faulty service or process cannot be restarted.

Possible Causes

The CGroup task usage exceeds 90%.

Handling Procedure

Check the maximum number of threads that can be concurrently opened by user omm is properly set.

  1. Log in to FusionInsight Manager and choose O&M > Alarm > Alarms. On the page that is displayed, click in the row containing the alarm, and view the name of the host for which the alarm is generated in Location. Click the host name to view its IP address.
  2. Log in to the host for which the alarm is generated as user omm.
  3. Run the following command to obtain the maximum number of threads that can be concurrently opened by user omm and check whether this number is greater than or equal to 60000:

    systemctl status user-$(id -u).slice | grep limit

    • If yes, go to 6.
    • If no, go to 4.

  4. Switch to user root and run the following command to change the value for user omm to 60000:

    systemctl set-property user-2000.slice TasksMax=60000

  5. Change the value of UserTasksMax in the /etc/systemd/logind.conf file to 60000. (If the parameter is commented out, uncomment it.) Save the file, wait 5 minutes, and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 6.

Collect fault information.

  1. On FusionInsight Manager of the cluster, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Expand the Service drop-down list, select OmmServer and NodeAgent for the target cluster, and click OK.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact O&M personnel and provide the collected logs.

Alarm Clearance

This alarm is automatically cleared after the fault is rectified.

Related Information

None.