ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold

Description

The system checks the Hive warehouse space usage every 30 seconds. The metric Percentage of HDFS Space Used by Hive to the Available Space can be viewed on the Hive service monitoring page. This alarm is generated when the Hive warehouse space usage exceeds the specified threshold (85% by default).

To change the threshold, choose O&M > Alarm > Thresholds > Name of the desired cluster > Hive > Percentage of HDFS Space Used by Hive to the Available Space.

When the Trigger Count is 1, this alarm is cleared when the Hive warehouse space usage is less than or equal to the threshold. When the Trigger Count is greater than 1, this alarm is cleared when the Hive warehouse space usage is less than or equal to 90% of the threshold.

The MRS cluster administrator can reduce the repository space usage by increasing the repository capacity or releasing some used space.

Attribute

Alarm ID	Alarm Severity	Automatically Cleared
16001	Minor	Yes

Parameters

Name	Meaning
Source	Specifies the cluster for which the alarm is generated.
ServiceName	Specifies the service for which the alarm is generated.
RoleName	Specifies the role for which the alarm is generated.
HostName	Specifies the host for which the alarm is generated.
Trigger condition	Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.

Impact on the System

The system cannot write data properly. Some data may be lost.

Possible Causes

The upper limit of the HDFS capacity available for Hive is too small.
The HDFS space is insufficient.
Some data nodes break down.

Procedure

Expand the system configuration.

Analyze the cluster HDFS space usage and increase the HDFS capacity for Hive.

Log in to FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Hive > Configuration, select All Configurations, search for hive.metastore.warehouse.size.percent, and increase the value. Assume that the value of the configuration item is A, the total HDFS storage space is B, the threshold is C, and the HDFS space used by Hive is D. Adjust the value by complying with A x B x C > D. You can view the total HDFS storage space on the HDFS NameNode page, and the HDFS space used by Hive on the Hive monitoring page.
Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 3.

Expand the system.

Expand the system.
Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 5.

Check whether the data node is normal.

On the FusionInsight Manager portal, click O&M > Alarm > Alarms.
Check whether "ALM-12006 Node Fault", "ALM-12007 Process Fault", or "ALM-14002 DataNode Disk Usage Exceeds the Threshold" exist.
- If yes, go to Step 7.
- If no, go to Step 9.
Handle the alarms by following the handling procedures in ALM-12006 Node Fault, ALM-12007 Process Fault, and ALM-14002 DataNode Disk Usage Exceeds the Threshold.
Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 9.

Collect fault information.

On the FusionInsight Manager portal, choose O&M > Log > Download.
Select Hive in the required cluster from the Service.
Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact the O&M personnel and send the collected logs.