ALM-16008 Non-Heap Memory Usage of the Hive Process Exceeds the Threshold

Alarm Description

The system checks the Hive service status every 30 seconds. The alarm is generated when the non-heap memory usage of an Hive service exceeds the threshold.

Users can choose O&M > Alarm > Thresholds > Name of the desired cluster > Hive to change the threshold.

The alarm is cleared when the non-heap memory usage is less than or equal to the threshold.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
16008	Versions earlier than MRS 3.3.0: major (default threshold: 95%) MRS 3.3.0 and later versions: Critical (default threshold: 95%) Major (default threshold: 85%)	Quality of service	Hive	Yes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

16008

Versions earlier than MRS 3.3.0: major (default threshold: 95%)

MRS 3.3.0 and later versions:

Critical (default threshold: 95%)

Major (default threshold: 85%)

Quality of service

Hive

Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster for which the alarm is generated.
	ServiceName	Specifies the service name for which the alarm is generated.
	RoleName	Specifies the role name for which the alarm is generated.
	HostName	Specifies the object (host ID) for which the alarm is generated.
Additional Information	Trigger Condition	Specifies the threshold for triggering the alarm.

Impact on the System

When the non-heap memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable.

Possible Causes

The non-heap memory of the Hive instance on the node is overused or the non-heap memory is inappropriately allocated. As a result, the usage exceeds the threshold.

Handling Procedure

Check non-heap memory usage.

On the FusionInsight Manager portal, click O&M > Alarm > Alarms and select the alarm whose Alarm ID is 16008. Then check the role name in Location and confirm the IP adress of the instance.
- If the role for which the alarm is generated is HiveServer, go to 2.
- If the role for which the alarm is generated is MetaStore, go to 3.
On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > Hive > Instance and click the HiveServer for which the alarm is generated to go to the Dashboard page. Click the drop-down menu in the Chart area and choose Customize > CPU and Memory, and select HiveServer Memory Usage Statistics and click OK, check whether the used non-heap memory of the HiveServer service reaches the threshold(default value: 95%) of the maximum non-heap memory specified for HiveServer.
- If yes, go to 4.
- If no, go to 7.
On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > Hive > Instance and click the MetaStore for which the alarm is generated to go to the Dashboard page. Click the drop-down menu in the Chart area and choose Customize > CPU and Memory, and select MetaStore Memory Usage Statistics and click OK, check whether the used non-heap memory of the MetaStore service reaches the threshold(default value: 95%) of the maximum non-heap memory specified for MetaStore.
- If yes, go to 4.
- If no, go to 7.
On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > Hive > Configurations > All Configurations. Choose HiveServer/MetaStore > JVM. Adjust the value of -XX:MaxMetaspaceSize in HIVE_GC_OPTS/METASTORE_GC_OPTS as the following rules. Click Save.
Suggestions for GC parameter settings for the HiveServer:
- It is recommended that you set the value of -XX:MaxMetaspaceSize to 1/8 of the value of -Xmx. For example, if -Xmx is set to 2 GB, -XX:
  MaxMetaspaceSize is set to 256 MB. If -Xmx is set to 4 GB, -XX:MaxMetaspaceSize is set to 512 MB.
Suggestions for GC parameter settings for the MetaServer:
- It is recommended that you set the value of -XX:MaxMetaspaceSize to 1/8 of the value of -Xmx. For example, if -Xmx is set to 2 GB, -XX:
  MaxMetaspaceSize is set to 256 MB. If -Xmx is set to 4 GB, -XX:MaxMetaspaceSize is set to 512 MB
Click More > Restart Service to restart the service.
Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 7.

Collect fault information.

On the FusionInsight Manager portal, choose O&M > Log > Download.
Select Hive in the required cluster from the Service.
Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact the O&M engineers and send the collected logs.