ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold

Alarm Description

The system checks the Hive warehouse space usage every 30 seconds. You can view Percentage of HDFS Space Used by Hive to the Available Space on the Hive service monitoring page. This alarm is generated when the Hive warehouse space usage exceeds the specified threshold.

To change the threshold, choose O&M. In the navigation pane on the left, click Alarm > Thresholds > Name of the desired cluster > Hive > Percentage of HDFS Space Used by Hive to the Available Space.

When the number of trigger times is 1, this alarm is cleared if the Hive warehouse space usage is less than or equal to the threshold. When the number of trigger times is greater than 1, this alarm is cleared if the Hive warehouse space usage is less than or equal to 90% of the threshold.

The MRS cluster administrator can reduce the repository space usage by increasing the repository capacity or releasing some used space.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
16001	MRS 3.3.0 and earlier: minor (default threshold: 85%) MRS 3.3.0 and later: Critical (default threshold: 95%) Major (default threshold: 85%)	Quality of service	Hive	Yes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

16001

MRS 3.3.0 and earlier: minor (default threshold: 85%)

MRS 3.3.0 and later:

Critical (default threshold: 95%)

Major (default threshold: 85%)

Quality of service

Hive

Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster for which the alarm was generated.
	ServiceName	Specifies the service for which the alarm was generated.
	RoleName	Specifies the role for which the alarm was generated.
	HostName	Specifies the host for which the alarm was generated.
Additional Information	Trigger Condition	Specifies the alarm triggering condition.

Impact on the System

The system cannot write data properly. Some data may be lost.

Possible Causes

The upper limit of the HDFS capacity available for Hive is too small.
The HDFS space is insufficient.
Some data nodes break down.

Handling Procedure

Extend system capacity.

Analyze the cluster HDFS space usage and increase the HDFS capacity for Hive.

Log in to FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Hive > Configuration, select All Configurations, search for hive.metastore.warehouse.size.percent, and increase the value. Assume that the value of the configuration item is A, total HDFS storage space is B, the threshold is C, and HDFS space used by Hive is D. Adjust the value by complying with A x B x C > D. You can view the total HDFS storage space on the HDFS NameNode page, and the HDFS space used by Hive on the Hive monitoring page.
Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 3.

Perform capacity expansion for the system.

Expand the system.
Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 5.

Check whether the data node is normal.

Log in to FusionInsight Manager and choose O&M > Alarm > Alarms.
Check whether ALM-12006 NodeAgent Process Is Abnormal, ALM-12007 Process Fault, and ALM-14002 DataNode Disk Usage Exceeds the Threshold are reported.
- If yes, go to 7.
- If no, go to 9.
Clear the alarm by following the steps provided in ALM-12006 NodeAgent Process Is Abnormal, ALM-12007 Process Fault, or ALM-14002 DataNode Disk Usage Exceeds the Threshold.
Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 9.

Collect fault information.

On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
Expand the Service drop-down list, and select Hive for the target cluster.
Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact O&M engineers and provide the collected logs.