Updated on 2024-01-17 GMT+08:00

ALM-13000 ZooKeeper Service Unavailable (For MRS 2.x or Earlier)

Description

The system checks the ZooKeeper service status every 30 seconds. This alarm is generated when the ZooKeeper service is unavailable.

This alarm is cleared when the ZooKeeper service recovers.

Attribute

Alarm ID

Alarm Severity

Auto Clear

13000

Critical

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

ZooKeeper fails to provide coordination services for upper-layer components and the components depending on ZooKeeper may not run properly.

Possible Causes

  • The ZooKeeper instance is abnormal.
  • The disk capacity is insufficient.
  • The network is faulty.
  • The DNS is installed on the ZooKeeper node.

Procedure

Check the ZooKeeper service instance status.

  1. On the MRS cluster details page, choose Components > ZooKeeper > quorumpeer.
  2. Check whether the ZooKeeper instances are normal.

    • If yes, go to 6.
    • If no, go to 3.

  3. Select instances whose status is not good and choose More > Restart Instance.
  4. Check whether the instance status is good after restart.

    • If yes, go to 5.
    • If no, go to 19.

  5. On the Alarms tab page, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 6.

    Check disk status.

  6. On the MRS cluster details page, choose Components > ZooKeeper > quorumpeer, and check the host information of each node housing the ZooKeeper instance.
  7. On the MRS cluster details page, click the Nodes tab and expand a node group.
  8. In the Disk Usage column, check whether the disk space of each node housing ZooKeeper instances is insufficient (disk usage exceeds 80%).

    • If yes, go to 9.
    • If no, go to 11.

  9. Expand the disk capacity. For details, see ALM-12017 Insufficient Disk Capacity (For MRS 2.x or Earlier).
  10. On the Alarms tab page, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 11.

    Check network communication status.

  11. On the Linux node housing the ZooKeeper instance, run the ping command to check whether the host names of other nodes housing the ZooKeeper instances can be pinged successfully.

    • If yes, go to 15.
    • If no, go to 12.

  12. Modify the IP addresses in /etc/hosts and add the mapping between host names and IP addresses.
  13. Run the ping command again to check whether the host names of other nodes housing the ZooKeeper instances can be pinged successfully.

    • If yes, go to 14.
    • If no, go to 19.

  14. On the Alarms tab page, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 15.

    Check the DNS.

  15. Check whether the DNS is installed on the node housing the ZooKeeper instance. On the Linux node housing the ZooKeeper instance, run the cat /etc/resolv.conf command to check whether the file is empty.

    • If yes, go to 16.
    • If no, go to 19.

  16. Run the service named status command to check whether the DNS is started.

    • If yes, go to 17.
    • If no, go to 19.

  17. Run the service named stop command to stop the DNS service. If "Shutting down name server BIND waiting for named to shut down (28s)" is displayed, the DNS service is stopped successfully. Comment out the content (if any) in /etc/resolv.conf.
  18. On the Alarms tab page, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 19.

  19. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Reference

None