ALM-16047 HiveServer Has Been Deregistered from ZooKeeper
Alarm Description
The system checks the Hive service every 60 seconds. This alarm is generated when Hive registration information on ZooKeeper is lost or Hive cannot connect to ZooKeeper.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
16047 |
Major |
Yes |
Alarm Parameters
Parameter |
Description |
---|---|
Source |
Specifies the cluster for which the alarm was generated. |
ServiceName |
Specifies the service for which the alarm was generated. |
RoleName |
Specifies the role for which the alarm was generated. |
HostName |
Specifies the host for which the alarm was generated. |
Impact on the System
When a Hive client sets up a new connection, it cannot select the HiveServer node that has been deregistered from ZooKeeper. If all HiveServer nodes have been deregistered from ZooKeeper, the HiveServer service will be unavailable.
Possible Causes
- The ZooKeeper instance is abnormal.
- Some Hive configurations are incorrect.
Handling Procedure
Check the ZooKeeper service status.
- On FusionInsight Manager, choose O&M > Alarm > Alarms and check whether ALM-12007 Process Fault exists in the alarm list.
- In Location of ALM-12007 Process Fault, check whether the service name is ZooKeeper.
- Rectify the fault by following steps provided in ALM-12007 Process Fault.
- In the alarm list, check whether this alarm is cleared.
- If yes, no further action is required.
- If no, go to 5.
Check whether the Hive configurations are correctly modified.
- On FusionInsight Manager, choose Audit. On the Audit page, click Advanced Search, click on the right of Operation Type, select Save configuration, click OK, and click Search.
- In the search result, check the historical configurations of Hive- and ZooKeeper-related services in the Service column. Table 1 lists some configurations that may affect the connection between Hive and ZooKeeper.
Table 1 Configurations related to connection between Hive and ZooKeeper Service
Parameter
Description
Hive
HIVE_GC_OPTS
HiveServer memory configuration. If the configuration is abnormal, HiveServer may restart repeatedly. In this case, you need to check the health status of the instance processes.
hive.zookeeper.quorum
IP address of the node accommodating ZooKeeper that is connected to Hive.
hive.zookeeper.client.port
Port of the ZooKeeper client connected to Hive.
hive.zookeeper.session.timeout
Timeout interval of the session set up between Hive and ZooKeeper.
hive.zookeeper.connection.timeout
Timeout interval for Hive to connect to ZooKeeper.
hive.zookeeper.connection.max.retries
Maximum number of retries for Hive to connect to ZooKeeper.
ZooKeeper
clientPort
Port number of the ZooKeeper client.
ssl.enabled
Whether to enable SSL connections of ZooKeeper.
Restart related instances.
- Log in to FusionInsight Manager. Choose O&M > Alarm > Alarms, click the drop-down list in the row that contains the alarm, and view the role and the IP address of the node for which the alarm is generated in Location.
- Choose Cluster, click the name of the desired cluster, and choose Services > Hive > Instance. On the page that is displayed, select the instance at the IP address for which the alarm is generated, click More, and select Restart Instance.
During Hive instance restart, the instance cannot provide services for external systems. SQL tasks that are being executed on the instance may fail.
- Wait 5 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 10.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, and select Hive for the target cluster.
- Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M personnel and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot