Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Alarm Reference/ ALM-12053 Host File Handle Usage Exceeds the Threshold
Updated on 2024-11-29 GMT+08:00

ALM-12053 Host File Handle Usage Exceeds the Threshold

Alarm Description

The system checks the file handle usage every 30 seconds and compares the actual usage with the threshold. This alarm is generated when the host file handle usage exceeds the threshold for several times (5 times by default) consecutively.

To change the threshold, choose O&M > Alarm > Thresholds > Name of the desired cluster > Host > Host Status > Host File Handle Usage.

When the Trigger Count is 1, this alarm is cleared when the host file handle usage is less than or equal to the threshold. When the Trigger Count is greater than 1, this alarm is cleared when the host file handle usage is less than or equal to 90% of the threshold.

Alarm Attributes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

12053

Critical (default threshold: 95%)

Major (default threshold: 80%)

Environment

FusionInsight Manager

Yes

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster or system for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Additional Information

Trigger Condition

Specifies the threshold for triggering the alarm.

Impact on the System

Service failure: When the host file handle usage exceeds the threshold, system applications cannot perform I/O operations such as file opening and network operations. As a result, the program is abnormal, which may cause job running failure.

Possible Causes

  • The application process is abnormal. For example, the opened file or socket is not closed.
  • The number of file handles cannot meet the current service requirements.
  • The system is abnormal.

Handling Procedure

Check information about files opened in processes.

  1. On FusionInsight Manager, click in the row where the alarm is located in the real-time alarm list and obtain the IP address of the host for which the alarm is generated.
  2. Log in to the host for which the alarm is generated as user root.
  3. Run the lsof -n|awk '{print $2}'|sort|uniq -c|sort -nr|more command to check the process that occupies excessive file handles.
  4. Check whether the processes in which a large number of files are opened are normal. For example, check whether there are files or sockets not closed.

    • If yes, go to 5.
    • If no, go to 7.

  5. Release the abnormal processes that occupy too many file handles.
  6. Five minutes later, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 7.

Increase the number of file handles.

  1. On FusionInsight Manager, click in the row where the alarm is located in the real-time alarm list and obtain the IP address of the host for which the alarm is generated.
  2. Log in to the host for which the alarm is generated as user root.
  3. Contact the system administrator to increase the number of system file handles.
  4. Run the cat /proc/sys/fs/file-nr command to view the used handles and the maximum number of file handles. The first value is the number of used handles, the third value is the maximum number. Please check whether the usage exceeds the threshold.

    • If yes, go to 9.
    • If no, go to 11.
      # cat /proc/sys/fs/file-nr
      12704 0 640000

  5. Wait for 5 minutes, and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 12.

Check whether the system environment is abnormal.

  1. Contact the system administrator to check whether the operating system is abnormal.

    • If yes, go to 13 to rectify the fault.
    • If no, go to 14.

  2. Wait for 5 minutes, and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 14.

Collect fault information.

  1. On the FusionInsight Manager home page of the active cluster, choose O&M > Log > Download.
  2. Select OMS from the Service and click OK.
  3. Set Host to the node for which the alarm is generated and the active OMS node.
  4. Click the edit button in the upper right corner, and set Start Date and End Date for log collection to 30 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  5. Contact the O&M engineers and send the collected log information.

Alarm Clearance

After the fault is rectified, the system automatically clears this alarm.

Related Information

None.