ALM-27003 DBService Heartbeat Interruption Between the Active and Standby Nodes
Description
This alarm is generated when the active or standby DBService node does not receive heartbeat messages from the peer node for 7 seconds.
This alarm is cleared when the heartbeat recovers.
Attribute
Alarm ID |
Alarm Severity |
Automatically Cleared |
---|---|---|
27003 |
Major |
Yes |
Parameters
Name |
Meaning |
---|---|
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
Local DBService HA Name |
Specifies a local DBService HA. |
Peer DBService HA Name |
Specifies a peer DBService HA. |
Impact on the System
During the DBService heartbeat interruption, only one node can provide the service. If this node is faulty, no standby node is available for failover and the service is unavailable.
Possible Causes
The link between the active and standby DBService nodes is abnormal.
Procedure
Check whether the network between the active DBService server and the standby DBService server is normal.
- In the alarm list on FusionInsight Manager, click in the row where the alarm is located in the real-time alarm list and view the standby DBService server address.
- Log in to the active DBService server as user root.
- Run the ping standby DBService heartbeat IP address command to check whether the standby DBService server is reachable.
- Contact the network administrator to check whether the network is faulty.
- Rectify the network fault and check whether the alarm is cleared from the alarm list.
- If yes, no further action is required.
- If no, go to 6.
Collect fault information.
- On the FusionInsight Manager portal, choose O&M > Log > Download.
- Select the following nodes in the required cluster from the Service:
- DBService
- Controller
- NodeAgent
- Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact the O&M personnel and send the collected logs.
Alarm Clearing
After the fault is rectified, the system automatically clears this alarm.
Related Information
None
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot