Help Center/ MapReduce Service/ User Guide/ Alarm Reference (Applicable to MRS 3.x)/ ALM-12053 Host File Handle Usage Exceeds the Threshold
Updated on 2022-09-26 GMT+08:00

ALM-12053 Host File Handle Usage Exceeds the Threshold

Description

The system checks the file handle usage every 30 seconds and compares the actual usage with the threshold (the default threshold is 80%). This alarm is generated when the host file handle usage exceeds the threshold for several times (5 times by default) consecutively.

To change the threshold, choose O&M > Alarm > Thresholds > Name of the desired cluster > Host > Host Status > Host File Handle Usage.

When the Trigger Count is 1, this alarm is cleared when the host file handle usage is less than or equal to the threshold. When the Trigger Count is greater than 1, this alarm is cleared when the host file handle usage is less than or equal to 90% of the threshold.

Attribute

Alarm ID

Alarm Severity

Auto Clear

12053

Major

Yes

Parameters

Name

Meaning

Source

Specifies the cluster or system for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Trigger Condition

Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.

Impact on the System

The I/O operations, such as opening a file or connecting to network, cannot be performed and programs are abnormal.

Possible Causes

  • The application process is abnormal. For example, the opened file or socket is not closed.
  • The number of file handles cannot meet the current service requirements.
  • The system is abnormal.

Procedure

Check information about files opened in processes.

  1. On FusionInsight Manager, click in the row where the alarm is located in the real-time alarm list and obtain the IP address of the host for which the alarm is generated.
  2. Log in to the host for which the alarm is generated as user root.
  3. Run the lsof -n|awk '{print $2}'|sort|uniq -c|sort -nr|more command to check the process that occupies excessive file handles.
  4. Check whether the processes in which a large number of files are opened are normal. For example, check whether there are files or sockets not closed.

    • If yes, go to 5.
    • If no, go to 7.

  5. Release the abnormal processes that occupy too many file handles.
  6. Five minutes later, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 7.

Increase the number of file handles.

  1. On FusionInsight Manager, click in the row where the alarm is located in the real-time alarm list and obtain the IP address of the host for which the alarm is generated.
  2. Log in to the host for which the alarm is generated as user root.
  3. Contact the system administrator to increase the number of system file handles.
  4. Run the cat /proc/sys/fs/file-nr command to view the used handles and the maximum number of file handles. The first value is the number of used handles, the third value is the maximum number. Please check whether the usage exceeds the threshold.

    • If yes, go to 9.
    • If no, go to 11.
      # cat /proc/sys/fs/file-nr
      12704 0 640000

  5. Wait for 5 minutes, and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 12.

Check whether the system environment is abnormal.

  1. Contact the system administrator to check whether the operating system is abnormal.

    • If yes, go to 13 to rectify the fault.
    • If no, go to 14.

  2. Wait for 5 minutes, and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 14.

Collect fault information.

  1. On the FusionInsight Manager home page of the active cluster, choose O&M > Log > Download.
  2. Select OMS from the Service and click OK.
  3. Set Host to the node for which the alarm is generated and the active OMS node.
  4. Click in the upper right corner, and set Start Date and End Date for log collection to 30 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  5. Contact the O&M personnel and send the collected log information.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Related Information

None