ALM-13000 ZooKeeper Service Unavailable (For MRS 2.x or Earlier)
Description
The system checks the ZooKeeper service status every 30 seconds. This alarm is generated when the ZooKeeper service is unavailable.
This alarm is cleared when the ZooKeeper service recovers.
Attribute
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
13000 |
Critical |
Yes |
Parameters
Parameter |
Description |
---|---|
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
Impact on the System
ZooKeeper fails to provide coordination services for upper-layer components and the components depending on ZooKeeper may not run properly.
Possible Causes
- The ZooKeeper instance is abnormal.
- The disk capacity is insufficient.
- The network is faulty.
- The DNS is installed on the ZooKeeper node.
Procedure
Check the ZooKeeper service instance status.
- On the MRS cluster details page, choose Components > ZooKeeper > quorumpeer.
- Check whether the ZooKeeper instances are normal.
- Select instances whose status is not good and choose More > Restart Instance.
- Check whether the instance status is good after restart.
- On the Alarms tab page, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 6.
Check disk status.
- On the MRS cluster details page, choose Components > ZooKeeper > quorumpeer, and check the host information of each node housing the ZooKeeper instance.
- On the MRS cluster details page, click the Nodes tab and expand a node group.
- In the Disk Usage column, check whether the disk space of each node housing ZooKeeper instances is insufficient (disk usage exceeds 80%).
- Expand the disk capacity. For details, see ALM-12017 Insufficient Disk Capacity (For MRS 2.x or Earlier).
- On the Alarms tab page, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 11.
Check network communication status.
- On the Linux node housing the ZooKeeper instance, run the ping command to check whether the host names of other nodes housing the ZooKeeper instances can be pinged successfully.
- Modify the IP addresses in /etc/hosts and add the mapping between host names and IP addresses.
- Run the ping command again to check whether the host names of other nodes housing the ZooKeeper instances can be pinged successfully.
- On the Alarms tab page, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 15.
Check the DNS.
- Check whether the DNS is installed on the node housing the ZooKeeper instance. On the Linux node housing the ZooKeeper instance, run the cat /etc/resolv.conf command to check whether the file is empty.
- Run the service named status command to check whether the DNS is started.
- Run the service named stop command to stop the DNS service. If "Shutting down name server BIND waiting for named to shut down (28s)" is displayed, the DNS service is stopped successfully. Comment out the content (if any) in /etc/resolv.conf.
- On the Alarms tab page, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 19.
- Collect fault information.
- On MRS Manager, choose .
- Contact the O&M engineers and send the collected logs.
Reference
None
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot