Help Center/ MapReduce Service/ User Guide/ MRS Cluster O&M/ MRS Cluster Alarm Handling Reference/ ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes
Updated on 2024-09-23 GMT+08:00

ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes

Description

This alarm is generated when the active or standby DBService node does not receive heartbeat messages from the peer node for 7 seconds.

This alarm is cleared when the heartbeat recovers.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

27003

Major

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Local DBService HA Name

Specifies a local DBService HA.

Peer DBService HA Name

Specifies a peer DBService HA.

Impact on the System

During the DBService heartbeat interruption, only one node can provide the service. If this node is faulty, no standby node is available for failover and the service is unavailable.

Possible Causes

The link between the active and standby DBService nodes is abnormal.

Procedure

Check whether the network between the active DBService server and the standby DBService server is normal.

  1. In the alarm list on FusionInsight Manager, click in the row where the alarm is located in the real-time alarm list and view the standby DBService server address.
  2. Log in to the active DBService server as user root.
  1. Run the ping standby DBService heartbeat IP address command to check whether the standby DBService server is reachable.

    • If yes, go to 6.
    • If no, go to 4.

  2. Contact the network administrator to check whether the network is faulty.

    • If yes, go to 5.
    • If no, go to 6.

  3. Rectify the network fault and check whether the alarm is cleared from the alarm list.

    • If yes, no further action is required.
    • If no, go to 6.

Collect fault information.

  1. On the FusionInsight Manager portal, choose O&M > Log > Download.
  2. Select the following nodes in the required cluster from the Service:

    • DBService
    • Controller
    • NodeAgent

  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact the O&M personnel and send the collected logs.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Related Information

None