ALM-12017 Insufficient Disk Capacity
Alarm Description
The system checks the host disk usage of the system every 30 seconds and compares the actual disk usage with the threshold. The disk usage has a default threshold, this alarm is generated when the host disk usage exceeds the specified threshold.
When the Trigger Count is 1, this alarm is cleared when the usage of a host disk partition is less than or equal to the threshold. When the Trigger Count is greater than 1, this alarm is cleared when the usage of a host disk partition is less than or equal to 90% of the threshold.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
12017 |
Major |
Yes |
Alarm Parameters
Parameter |
Description |
---|---|
Source |
Specifies the cluster or system for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
PartitionName |
Specifies the device partition for which the alarm is generated. |
Trigger Condition |
Specifies the threshold for triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated. |
Impact on the System
If you need to modify or use data on the disk when the disk capacity is insufficient, the job may fail.
Possible Causes
- The alarm threshold is incorrect.
- The disk configuration cannot meet service requirements. The disk usage reaches the upper limit.
Handling Procedure
Check whether the threshold is set properly.
- Log in to FusionInsight Manager, choose O&M > Alarm > Thresholds > Host > Disk > Disk Usage and check whether the threshold (configurable, 90% by default) is appropriate.
- Locate the target threshold rule and click Modify in the Operation column to change the alarm threshold based on the current disk usage.
Figure 1 Setting an alarm threshold
- After 2 minutes, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 4.
Check whether the disk usage reaches the upper limit.
- In the alarm list on FusionInsight Manager, click
in the row where the alarm is located to view the alarm host name and disk partition information in the alarm details.
- Log in to the node for which the alarm is generated as user root.
- Check the system disk partition usage. Check whether the disk is mounted to the following directories based on the disk partition name obtained in Step 4: /, /opt, /tmp, /var, /var/log, and /srv/BigData (can be customized).
df -lmPT | awk '$2 != "iso9660"' | grep '^/dev/' | awk '{"readlink -m "$1 | getline real }{$1=real; print $0}' | sort -u -k 1,1
- Check the system disk partition usage. Determine the role of the disk based on the disk partition name obtained in Step 4.
df -lmPT | awk '$2 != "iso9660"' | grep '^/dev/' | awk '{"readlink -m "$1 | getline real }{$1=real; print $0}' | sort -u -k 1,1
- Check whether the service that the disk belongs to is HDFS, Yarn, Kafka, Supervisor, or other services that require disk storage.
- If yes, expand the cluster capacity by referring to Scaling Out an MRS Cluster and go to Step 9.
- If no, go to Step 12.
- After 2 minutes, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 12.
- Check whether there is a file larger than 500 MB on the node: Check whether a large file that is written into the disk by mistake:
find / -xdev -size +500M -exec ls -l {} \;
- Handle the large file and check whether the alarm is cleared 2 minutes later.
- If yes, no further action is required.
- If no, go to Step 12.
- Contact the system administrator to expand the disk capacity.
- After 2 minutes, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 14.
Collect fault information.
- On FusionInsight Manager, choose O&M > Log > Download.
- Select OMS from the Service and click OK.
- Click
in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact the O&M personnel and send the collected log information.
Alarm Clearance
After the fault is rectified, the system automatically clears this alarm.
Related Information
None
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot