ALM-12091 Abnormal disaster Resources
Alarm Description
HA checks the disaster resources of Manager every 86 seconds. This alarm is generated when HA detects that the disaster resources have been abnormal for 10 consecutive times.
This alarm is cleared when HA detects that the disaster resources become normal.
Resource Type of disaster is Single-active. Active/Standby switchover will be triggered upon resource exceptions. When this alarm is generated, the active/standby switchover is complete and new disaster resources have been enabled on the new active Manager. In this case, this alarm is cleared. This alarm is used to notify users of the cause of the active/standby Manager switchover.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
12091 |
Major |
Yes |
Alarm Parameters
Parameter |
Description |
---|---|
Source |
Specifies the cluster or system for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
Impact on the System
- The active/standby Manager switchover occurs.
- The disaster process restarts repeatedly, which may cause active/standby DR to be unavailable.
Possible Causes
The disaster process is abnormal.
Handling Procedure
Check whether the disaster process is normal.
- In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and click to view the name of the host for which the alarm is generated.
- Log in to the host for which the alarm is generated as user root.
- Run the su - omm command to switch to user omm.
- Run the sh ${BIGDATA_HOME}/om-server/OMS/workspace0/ha/module/hacom/script/status_ha.sh command to check whether the status of the disaster resources managed by the HA is normal. In the single-node system, the disaster resource is in the normal state. In the dual-node system, the disaster resource is in the normal state on the active node and in the stopped state on the standby node.
- Run the vi ${BIGDATA_LOG_HOME}/disaster/disaster.log command to check whether the disaster resource log of HA contains the keyword ERROR. If yes, analyze the logs to locate the resource exception cause and fix the exception.
- Wait 5 minutes and check whether the alarm is automatically cleared.
- If yes, no further action is required.
- If no, go to 7.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, select Disaster for the target cluster, and click OK.
- Click in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M personnel and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.