Updated on 2024-09-23 GMT+08:00

ALM-12006 Node Fault (For MRS 2.x or Earlier)

Description

Controller checks the NodeAgent status every 30 seconds. This alarm is generated when Controller fails to receive the status report of a NodeAgent for three consecutive times.

This alarm is cleared when Controller can properly receive the status report of the NodeAgent.

Attribute

Alarm ID

Alarm Severity

Auto Clear

12006

Critical

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

Services on the node are unavailable.

Possible Causes

The network is disconnected, or the hardware is faulty.

Procedure

  1. Check whether the network is disconnected or the hardware is faulty.

    1. Go to the MRS cluster details page. In the alarm list on the alarm management tab page, click the row that contains the alarm. In the alarm details, view the host address of the alarm.
    2. Log in to the active management node.
    3. Run the following command to check whether the faulty node is reachable:

      ping IP address of the faulty host

      1. If yes, go to 2.
      2. If no, go to 1.d.
    4. Contact the O&M personnel to check whether the network is faulty.
      • If yes, go to 2.
      • If no, go to 1.f.
    5. Rectify the network fault and check whether the alarm is cleared from the alarm list.
      • If yes, no further action is required.
      • If no, go to 1.f.
    6. Contact the O&M personnel to check whether a hardware fault (for example, a CPU or memory fault) occurs on the node.
      • If yes, go to 1.g.
      • If no, go to 2.
    7. Repair the faulty components and restart the node. Check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to 2.

  2. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Reference

None