Help Center/ MapReduce Service/ User Guide/ MRS Cluster O&M/ MRS Cluster Alarm Handling Reference/ ALM-16047 HiveServer Has Been Deregistered from ZooKeeper
Updated on 2024-09-23 GMT+08:00

ALM-16047 HiveServer Has Been Deregistered from ZooKeeper

Alarm Description

The system checks the Hive service every 60 seconds. This alarm is generated when Hive registration information on ZooKeeper is lost or Hive cannot connect to ZooKeeper.

Alarm Attributes

Alarm ID

Alarm Severity

Auto Cleared

16047

Major

Yes

Alarm Parameters

Parameter

Description

Source

Specifies the cluster for which the alarm was generated.

ServiceName

Specifies the service for which the alarm was generated.

RoleName

Specifies the role for which the alarm was generated.

HostName

Specifies the host for which the alarm was generated.

Impact on the System

When a Hive client sets up a new connection, it cannot select the HiveServer node that has been deregistered from ZooKeeper. If all HiveServer nodes have been deregistered from ZooKeeper, the HiveServer service will be unavailable.

Possible Causes

  • The ZooKeeper instance is abnormal.
  • Some Hive configurations are incorrect.

Handling Procedure

Check the ZooKeeper service status.

  1. On FusionInsight Manager, choose O&M > Alarm > Alarms and check whether ALM-12007 Process Fault exists in the alarm list.

    • If yes, go to 2.
    • If no, go to 5.

  2. In Location of ALM-12007 Process Fault, check whether the service name is ZooKeeper.

    • If yes, go to 3.
    • If no, go to 5.

  3. Rectify the fault by following steps provided in ALM-12007 Process Fault.
  4. In the alarm list, check whether this alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 5.

Check whether the Hive configurations are correctly modified.

  1. On FusionInsight Manager, choose Audit. On the Audit page, click Advanced Search, click on the right of Operation Type, select Save configuration, click OK, and click Search.
  2. In the search result, check the historical configurations of Hive- and ZooKeeper-related services in the Service column. Table 1 lists some configurations that may affect the connection between Hive and ZooKeeper.

    Table 1 Configurations related to connection between Hive and ZooKeeper

    Service

    Parameter

    Description

    Hive

    HIVE_GC_OPTS

    HiveServer memory configuration. If the configuration is abnormal, HiveServer may restart repeatedly. In this case, you need to check the health status of the instance processes.

    hive.zookeeper.quorum

    IP address of the node accommodating ZooKeeper that is connected to Hive.

    hive.zookeeper.client.port

    Port of the ZooKeeper client connected to Hive.

    hive.zookeeper.session.timeout

    Timeout interval of the session set up between Hive and ZooKeeper.

    hive.zookeeper.connection.timeout

    Timeout interval for Hive to connect to ZooKeeper.

    hive.zookeeper.connection.max.retries

    Maximum number of retries for Hive to connect to ZooKeeper.

    ZooKeeper

    clientPort

    Port number of the ZooKeeper client.

    ssl.enabled

    Whether to enable SSL connections of ZooKeeper.

Restart related instances.

  1. Log in to FusionInsight Manager. Choose O&M > Alarm > Alarms, click the drop-down list in the row that contains the alarm, and view the role and the IP address of the node for which the alarm is generated in Location.
  2. Choose Cluster, click the name of the desired cluster, and choose Services > Hive > Instance. On the page that is displayed, select the instance at the IP address for which the alarm is generated, click More, and select Restart Instance.

    During Hive instance restart, the instance cannot provide services for external systems. SQL tasks that are being executed on the instance may fail.

  3. Wait 5 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 10.

Collect fault information.

  1. On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Expand the Service drop-down list, and select Hive for the target cluster.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact O&M personnel and provide the collected logs.

Alarm Clearance

This alarm is automatically cleared after the fault is rectified.

Related Information

None