ALM-16001 Hive Warehouse Space Usage Exceeds the Threshold (For MRS 2.x or Earlier)

Description

The system checks the Hive warehouse space usage every 30 seconds. The indicator Percentage of HDFS Space Used by Hive to the Available Space can be viewed on the Hive service monitoring page. This alarm is generated when the Hive warehouse space usage exceeds the specified threshold (85% by default).

This alarm is cleared when the Hive warehouse space usage is less than or equal to the threshold. You can reduce the warehouse space usage by expanding the warehouse capacity or releasing the used space.

Attribute

Alarm ID	Alarm Severity	Auto Clear
16001	Major	Yes

Parameters

Parameter	Description
ServiceName	Specifies the service for which the alarm is generated.
RoleName	Specifies the role for which the alarm is generated.
HostName	Specifies the host for which the alarm is generated.
Trigger condition	Generates an alarm when the actual indicator value exceeds the specified threshold.

Impact on the System

The system fails to write data, which causes data loss.

Possible Causes

The upper limit of the HDFS capacity available for Hive is too small.
The system disk space is insufficient.
Some data nodes break down.

Procedure

Expand the system configuration.
1. Analyze the cluster HDFS capacity usage and increase the upper limit of the HDFS capacity available for Hive.
  Go to the MRS cluster details page, choose Components > Hive > Service Configuration, set Type to All, search for hive.metastore.warehouse.size.percent, and increase the value of this parameter. Suppose that the value of the configuration item is A, total HDFS storage space is B, the threshold is C, and HDFS space used by Hive is D. Adjust the value of the configuration item according to A x B x C > D. The total HDFS storage space can be viewed on the HDFS monitoring page, and HDFS space used by Hive can be viewed on the Hive monitoring page.
2. Check whether the alarm is cleared.
  - If yes, no further action is required.
  - If no, go to 2.a.
Expand the system.
1. Add nodes.
2. Check whether the alarm is cleared.
  - If yes, no further action is required.
  - If no, go to 3.a.
Check whether the data node is normal.
1. Go to the cluster details page and choose Alarms.
2. Check whether ALM-12006 Node Fault, ALM-12007 Process Fault, or ALM-14002 DataNode Disk Usage Exceeds the Threshold exists.
  - If yes, go to 3.c.
  - If no, go to 4.
3. Clear the alarm by following the steps provided in ALM-12006 Node Fault, ALM-12007 Process Fault, or ALM-14002 DataNode Disk Usage Exceeds the Threshold.
4. Check whether the alarm is cleared.
  - If yes, no further action is required.
  - If no, go to 4.
Collect fault information.
1. On MRS Manager, choose System > Export Log.
2. Contact the O&M engineers and send the collected logs.