Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Alarm Reference/ ALM-14023 Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold

Updated on 2024-11-29 GMT+08:00

View PDF

ALM-14023 Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold

Alarm Description

The system checks the percentage of total reserved disk space for replicas (Total reserved disk space for replicas/(Total reserved disk space for replicas + Total remaining disk space)) every 30 seconds and compares the actual percentage with the threshold (90% by default). This alarm is generated when the percentage of total reserved disk space for replicas exceeds the threshold for multiple consecutive times (Trigger Count).

The alarm is cleared in the following two scenarios: The value of Trigger Count is 1 and the percentage of total reserved disk space for replicas is less than or equal to the threshold; the value of Trigger Count is greater than 1 and the percentage of total reserved disk space for replicas is less than or equal to 90% of the threshold.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
14023	Major (default threshold: 95%) Minor (default threshold: 90%)	Quality of service	HDFS	Yes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

14023

Major (default threshold: 95%)

Minor (default threshold: 90%)

Quality of service

HDFS

Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster for which the alarm is generated.
	ServiceName	Specifies the service for which the alarm is generated.
	RoleName	Specifies the role for which the alarm is generated.
	HostName	Specifies the host for which the alarm is generated.
	NameServiceName	Specifies the NameService service for which the alarm is generated.
Additional Information	Trigger Condition	Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.

Impact on the System

The performance of writing data to HDFS is affected. If all remaining DataNode space is reserved for replicas, writing HDFS data fails.

Possible Causes

The alarm threshold is improperly configured.
The disk space configured for the HDFS cluster is insufficient.
The volume of services that access HDFS is too large and therefore DataNode is overloaded.

Handling Procedure

Check whether the alarm threshold is appropriate.

On the FusionInsight Manager portal, choose O&M > Alarm > Thresholds > Name of the desired cluster > HDFS > Disk > Percentage of Reserved Space for Replicas of Unused Space to check whether the alarm threshold is appropriate. (The default threshold is 90%. Users can change it as required.)
- If yes, go to 4.
- If no, go to 2.
Choose O&M > Alarm > Thresholds > Name of the desired cluster > HDFS > Disk > Percentage of Reserved Space for Replicas of Unused Space and Click Modify, change the threshold based on the actual usage.
Wait 5 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 4.

Check whether an alarm indicating insufficient disk space is generated.

On the FusionInsight Manager portal, check whether ALM-14001 HDFS Disk Usage Exceeds the Threshold or ALM-14002 DataNode Disk Usage Exceeds the Threshold exists on the O&M > Alarm > Alarms page.
- If yes, go to 5.
- If no, go to 7.
Handle the alarm by referring to instructions in ALM-14001 HDFS Disk Usage Exceeds the Threshold or ALM-14002 DataNode Disk Usage Exceeds the Threshold and check whether the alarm is cleared.
- If yes, go to 6.
- If no, go to 7.
Wait 5 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 7.

Expand the DataNode capacity.

Expand the DataNode capacity.
Wait 5 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 9.

Collect fault information.

On the FusionInsight Manager portal, choose O&M > Log > Download.
Select HDFS in the required cluster from the Service.
Click in the upper right corner, and set Start Date and End Date for log collection to 20 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact the O&M engineers and send the collected logs.