Alarms Are Frequently Generated in the MRS Cluster
Issue
The cluster frequently reports alarms indicating that the heartbeat between the active and standby Manager nodes is interrupted, the heartbeat between the active and standby DBService nodes is interrupted, and the node is faulty. As a result, Hive is occasionally unavailable.
Symptom
The cluster frequently reports alarms indicating that the heartbeat between the active and standby Manager nodes is interrupted, the heartbeat between the active and standby DBService nodes is interrupted, and the node is faulty. As a result, Hive is occasionally unavailable, affecting customer services
Cause Analysis
- When the alarm is generated, the VM is restarted. The alarm is generated because the VM is restarted.
- According to the OS analysis, the cause of the VM restart is that the node does not have available memory. Memory overflow triggers oom-killer. When the process is invoked, the process enters the disk sleep state. As a result, the VM restarts.
- Check the processes that occupy the memory. It is found that the processes that occupy the memory are normal service processes.
Conclusion: The VM memory cannot meet service requirements.
Procedure
- You are advised to expand the node memory.
- You are advised to disable unnecessary services to avoid this problem.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.