Help Center/ MapReduce Service/ Troubleshooting/ Cluster Management/ Alarms Indicating Heartbeat Interruptions Between Nodes Are Frequently Generated in the MRS Cluster
Updated on 2023-11-30 GMT+08:00

Alarms Indicating Heartbeat Interruptions Between Nodes Are Frequently Generated in the MRS Cluster

Symptom

The MRS cluster frequently reports alarms indicating that the heartbeats between active and standby Manager nodes or between active and standby DBService nodes are interrupted, or a node is faulty. As a result, Hive is occasionally unavailable, affecting upper-layer services.

Cause Analysis

  1. When the alarm is generated, the VM is restarted. The alarm is generated because the VM is restarted.

  2. According to the OS analysis, the cause of the VM restart is that the node does not have available memory. Memory overflow triggers oom-killer. When the process is invoked, the process enters the disk sleep state. As a result, the VM restarts.

  3. Check the processes that occupy the memory. It is found that the processes that occupy the memory are normal service processes.

Conclusion: The VM memory cannot meet service requirements.

Procedure

  • You are advised to expand the node memory.
  • You are advised to disable unnecessary services.