Help Center/ MapReduce Service/ User Guide/ MRS Cluster O&M/ MRS Cluster Alarm Handling Reference/ ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes (For MRS 2.x or Earlier)
Updated on 2024-09-23 GMT+08:00

ALM-12010 Manager Heartbeat Interruption Between the Active and Standby Nodes (For MRS 2.x or Earlier)

Description

This alarm is generated when the active Manager does not receive any heartbeat signal from the standby Manager within 7 seconds.

This alarm is cleared when the active Manager receives heartbeat signals from the standby Manager.

Attribute

Alarm ID

Alarm Severity

Auto Clear

12010

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Local Manager HA Name

Specifies a local Manager HA.

Peer Manager HA Name

Specifies a peer Manager HA.

Impact on the System

When the active Manager process is abnormal, an active/standby failover cannot be performed, and services are affected.

Possible Causes

The link between the active and standby Manager servers is abnormal.

Procedure

  1. Check whether the network between the active and standby Manager servers is normal.

    1. Go to the MRS cluster details page. In the alarm list on the alarm management tab page, click the row that contains the alarm. In the alarm details, view the address of the standby Manager server.
    2. Log in to the active management node.
    3. Run the following command to check whether the standby Manager is reachable:

      ping heartbeat IP address of the standby Manager

      • If yes, go to 2.
      • If no, go to 1.d.
    4. Contact the O&M personnel to check whether the network is faulty.
      • If yes, go to 1.e.
      • If no, go to 2.
    5. Rectify the network fault and check whether the alarm is cleared from the alarm list.
      • If yes, no further action is required.
      • If no, go to 2.

  2. Log in to all master nodes in the cluster and run the following commands to find all sedxxx files and delete them:

    find /srv/BigData/ -name "sed*"

    find /opt -name "sed*"

  3. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Reference

None