Updated on 2024-09-23 GMT+08:00

ALM-16007 Hive GC Time Exceeds the Threshold

Description

The system checks the garbage collection (GC) time of the Hive service every 60 seconds. This alarm is generated when the detected GC time exceeds the threshold (exceeds 12 seconds for three consecutive checks.) To change the threshold, choose O&M > Alarm > Thresholds > Name of the desired cluster > Hive. This alarm is cleared when the Hive GC time is shorter than or equal to the threshold.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

16007

Major

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service name for which the alarm is generated.

RoleName

Specifies the role name for which the alarm is generated.

HostName

Specifies the object (host ID) for which the alarm is generated.

Trigger Condition

Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.

Impact on the System

If the GC time exceeds the threshold, Hive data read and write are affected, task execution may slow down, or services may restart unexpectedly.

Possible Causes

The memory of Hive instances is overused, the heap memory is inappropriately allocated. As a result, GCs occur frequently.

Procedure

Check the GC time.

  1. On the FusionInsight Manager portal, click O&M > Alarm > Alarms and select the alarm whose Alarm ID is 16007. Then check the role name in Location and confirm the IP adress of the instance.

    • If the role for which the alarm is generated is HiveServer, go to 2.
    • If the role for which the alarm is generated is MetaStore, go to 3.

  2. On the FusionInsight Manager portal, choose Cluster >Name of the desired cluster > Services > Hive > Instance and click the HiveServer for which the alarm is generated to go to the Dashboard page. Click the drop-down menu in the Chart area and choose Customize > GC, and select Garbage Collection (GC) Time of HiveServer and click OK to check whether the GC time is longer than 12 seconds.

    • If yes, go to 4.
    • If no, go to 7.
    Figure 1 Garbage Collection (GC) Time of HiveServer

  3. On the FusionInsight Manager portal, choose Cluster >Name of the desired cluster > Services > Hive > Instance and click the MetaStore for which the alarm is generated to go to the Dashboard page. Click the drop-down menu in the Chart area and choose Customize > GC, and select Garbage Collection (GC) Time of MetaStore and click OK to check whether the GC time is longer than 12 seconds.

    • If yes, go to 4.
    • If no, go to 7.
    Figure 2 Garbage Collection (GC) Time of MetaStore

Check the current JVM configuration.

  1. On the FusionInsight Manager portal, choose Cluster >Name of the desired cluster > Services > Hive > Configurations > All Configurations. Choose HiveServer/MetaStore > JVM. Adjust the value of -Xmx in HIVE_GC_OPTS/METASTORE_GC_OPTS as the following rules. Click Save.

    Suggestions for GC parameter settings for the HiveServer:
    • When the Hive GC time exceeds the threshold, change the value of -Xmx to twice the default value. For example, if -Xmx is set to 2 GB by default, change the value of -Xmx to 4 GB.
    • You are advised to change the value of -Xms to set the ratio of -Xms and -Xmx to 1:2 to avoid performance problems when JVM dynamically.
    Suggestions for GC parameter settings for the MetaServer:
    • When the Meta GC time exceeds the threshold, change the value of -Xmx to twice the default value. For example, if -Xmx is set to 2 GB by default, change the value of -Xmx to 4 GB.
    • You are advised to change the value of -Xms to set the ratio of -Xms and -Xmx to 1:2 to avoid performance problems when JVM dynamically.

  2. Click More > Restart Service to restart the service.

    During Hive service restart, instances cannot provide services for external systems, and the SQL tasks that are being executed on the instances may fail.

  3. Check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 7.

Collect fault information.

  1. On the FusionInsight Manager portal of active and standby clusters, choose O&M > Log > Download.
  2. In the Service, select Hive in the required cluster.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact the O&M personnel and send the collected logs.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Related Information

None