ALM-14023 Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold
Description
The system checks the percentage of total reserved disk space for replicas (Total reserved disk space for replicas/(Total reserved disk space for replicas + Total remaining disk space)) every 30 seconds and compares the actual percentage with the threshold (90% by default). This alarm is generated when the percentage of total reserved disk space for replicas exceeds the threshold for multiple consecutive times (Trigger Count).
The alarm is cleared in the following two scenarios: The value of Trigger Count is 1 and the percentage of total reserved disk space for replicas is less than or equal to the threshold; the value of Trigger Count is greater than 1 and the percentage of total reserved disk space for replicas is less than or equal to 90% of the threshold.
Attribute
Alarm ID |
Alarm Severity |
Automatically Cleared |
---|---|---|
14023 |
Minor |
Yes |
Parameters
Name |
Meaning |
---|---|
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
NameServiceName |
Specifies the NameService service for which the alarm is generated. |
Trigger condition |
Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated. |
Impact on the System
The performance of writing data to HDFS is affected. If all remaining DataNode space is reserved for replicas, writing HDFS data fails.
Possible Causes
- The alarm threshold is improperly configured.
- The disk space configured for the HDFS cluster is insufficient.
- The volume of services that access HDFS is too large and therefore DataNode is overloaded.
Procedure
Check whether the alarm threshold is appropriate.
- On the FusionInsight Manager portal, choose O&M > Alarm > Thresholds > Name of the desired cluster > HDFS > Disk > Percentage of Reserved Space for Replicas of Unused Space to check whether the alarm threshold is appropriate. (The default threshold is 90%. Users can change it as required.)
- Choose O&M > Alarm > Thresholds > Name of the desired cluster > HDFS > Disk > Percentage of Reserved Space for Replicas of Unused Space and Click Modify, change the threshold based on the actual usage.
Figure 1 Modify Thresholds
- Wait 5 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 4.
Check whether an alarm indicating insufficient disk space is generated.
- On the FusionInsight Manager portal, check whether ALM-14001 HDFS Disk Usage Exceeds the Threshold or ALM-14002 DataNode Disk Usage Exceeds the Threshold exists on the O&M > Alarm > Alarms page.
- Handle the alarm by referring to instructions in ALM-14001 HDFS Disk Usage Exceeds the Threshold or ALM-14002 DataNode Disk Usage Exceeds the Threshold and check whether the alarm is cleared.
- Wait 5 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 7.
Expand the DataNode capacity.
- Expand the DataNode capacity.
- Wait 5 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 9.
Collect fault information.
- On the FusionInsight Manager portal, choose O&M > Log > Download.
- Select HDFS in the required cluster from the Service.
- Click in the upper right corner, and set Start Date and End Date for log collection to 20 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact the O&M personnel and send the collected logs.
Alarm Clearing
After the fault is rectified, the system automatically clears this alarm.
Related Information
None
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot