ALM-12053 Host File Handle Usage Exceeds the Threshold
Alarm Description
The system checks the file handle usage every 30 seconds and compares the actual usage with the threshold. This alarm is generated when the host file handle usage exceeds the threshold for several times (5 times by default) consecutively.
To change the threshold, choose O&M > Alarm > Thresholds > Name of the desired cluster > Host > Host Status > Host File Handle Usage.
When the Trigger Count is 1, this alarm is cleared when the host file handle usage is less than or equal to the threshold. When the Trigger Count is greater than 1, this alarm is cleared when the host file handle usage is less than or equal to 90% of the threshold.
Alarm Attributes
Alarm ID |
Alarm Severity |
Alarm Type |
Service Type |
Auto Cleared |
---|---|---|---|---|
12053 |
Critical (default threshold: 95%) Major (default threshold: 80%) |
Environment |
FusionInsight Manager |
Yes |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster or system for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
|
RoleName |
Specifies the role for which the alarm is generated. |
|
HostName |
Specifies the host for which the alarm is generated. |
|
Additional Information |
Trigger Condition |
Specifies the threshold for triggering the alarm. |
Impact on the System
Service failure: When the host file handle usage exceeds the threshold, system applications cannot perform I/O operations such as file opening and network operations. As a result, the program is abnormal, which may cause job running failure.
Possible Causes
- The application process is abnormal. For example, the opened file or socket is not closed.
- The number of file handles cannot meet the current service requirements.
- The system is abnormal.
Handling Procedure
Check information about files opened in processes.
- On FusionInsight Manager, click in the row where the alarm is located in the real-time alarm list and obtain the IP address of the host for which the alarm is generated.
- Log in to the host for which the alarm is generated as user root.
- Run the lsof -n|awk '{print $2}'|sort|uniq -c|sort -nr|more command to check the process that occupies excessive file handles.
- Check whether the processes in which a large number of files are opened are normal. For example, check whether there are files or sockets not closed.
- Release the abnormal processes that occupy too many file handles.
- Five minutes later, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 7.
Increase the number of file handles.
- On FusionInsight Manager, click in the row where the alarm is located in the real-time alarm list and obtain the IP address of the host for which the alarm is generated.
- Log in to the host for which the alarm is generated as user root.
- Contact the system administrator to increase the number of system file handles.
- Run the cat /proc/sys/fs/file-nr command to view the used handles and the maximum number of file handles. The first value is the number of used handles, the third value is the maximum number. Please check whether the usage exceeds the threshold.
- Wait for 5 minutes, and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 12.
Check whether the system environment is abnormal.
- Contact the system administrator to check whether the operating system is abnormal.
- Wait for 5 minutes, and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 14.
Collect fault information.
- On the FusionInsight Manager home page of the active cluster, choose O&M > Log > Download.
- Select OMS from the Service and click OK.
- Set Host to the node for which the alarm is generated and the active OMS node.
- Click the edit button in the upper right corner, and set Start Date and End Date for log collection to 30 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact the O&M engineers and send the collected log information.
Alarm Clearance
After the fault is rectified, the system automatically clears this alarm.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot