ALM-12091 Abnormal disaster Resources

Alarm Description

HA checks the disaster resources of Manager every 86 seconds. This alarm is generated when HA detects that the disaster resources have been abnormal for 10 consecutive times.

This alarm is cleared when HA detects that the disaster resources become normal.

Resource Type of disaster is Single-active. Active/Standby switchover will be triggered upon resource exceptions. When this alarm is generated, the active/standby switchover is complete and new disaster resources have been enabled on the new active Manager. In this case, this alarm is cleared. This alarm is used to notify users of the cause of the active/standby Manager switchover.

Alarm Attributes

Alarm ID	Alarm Severity	Auto Cleared
12091	Major	Yes

Alarm Parameters

Parameter	Description
Source	Specifies the cluster or system for which the alarm is generated.
ServiceName	Specifies the service for which the alarm is generated.
RoleName	Specifies the role for which the alarm is generated.
HostName	Specifies the host for which the alarm is generated.

Impact on the System

The active/standby Manager switchover occurs.
The disaster process restarts repeatedly, which may cause active/standby DR to be unavailable.

Possible Causes

The disaster process is abnormal.

Handling Procedure

Check whether the disaster process is normal.

In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and click to view the name of the host for which the alarm is generated.
Log in to the host for which the alarm is generated as user root.
Run the su - omm command to switch to user omm.
Run the sh ${BIGDATA_HOME}/om-server/OMS/workspace0/ha/module/hacom/script/status_ha.sh command to check whether the status of the disaster resources managed by the HA is normal. In the single-node system, the disaster resource is in the normal state. In the dual-node system, the disaster resource is in the normal state on the active node and in the stopped state on the standby node.
- If yes, go to Step 7.
- If no, go to Step 5.
Run the vi ${BIGDATA_LOG_HOME}/disaster/disaster.log command to check whether the disaster resource log of HA contains the keyword ERROR. If yes, analyze the logs to locate the resource exception cause and fix the exception.
Wait 5 minutes and check whether the alarm is automatically cleared.
- If yes, no further action is required.
- If no, go to Step 7.

Collect fault information.

On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
Expand the Service drop-down list, select Disaster for the target cluster, and click OK.
Click in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
Contact O&M personnel and provide the collected logs.