Help Center/ MapReduce Service/ Troubleshooting/ Cluster Management/ Alarms Are Frequently Generated in the MRS Cluster
Updated on 2022-09-14 GMT+08:00

Alarms Are Frequently Generated in the MRS Cluster

Issue

The cluster frequently reports alarms indicating that the heartbeat between the active and standby Manager nodes is interrupted, the heartbeat between the active and standby DBService nodes is interrupted, and the node is faulty. As a result, Hive is occasionally unavailable.

Symptom

The cluster frequently reports alarms indicating that the heartbeat between the active and standby Manager nodes is interrupted, the heartbeat between the active and standby DBService nodes is interrupted, and the node is faulty. As a result, Hive is occasionally unavailable, affecting customer services

Cause Analysis

  1. When the alarm is generated, the VM is restarted. The alarm is generated because the VM is restarted.

  2. According to the OS analysis, the cause of the VM restart is that the node does not have available memory. Memory overflow triggers oom-killer. When the process is invoked, the process enters the disk sleep state. As a result, the VM restarts.

  3. Check the processes that occupy the memory. It is found that the processes that occupy the memory are normal service processes.

Conclusion: The VM memory cannot meet service requirements.

Procedure

  • You are advised to expand the node memory.
  • You are advised to disable unnecessary services to avoid this problem.