Help Center/ MapReduce Service/ User Guide/ MRS Cluster O&M/ MRS Cluster Alarm Handling Reference/ ALM-12007 Process Fault (For MRS 2.x or Earlier)
Updated on 2024-09-23 GMT+08:00

ALM-12007 Process Fault (For MRS 2.x or Earlier)

Description

The process health check module checks the process status every 5 seconds. This alarm is generated when the process health check module detects that the process connection status is Bad for three consecutive times.

This alarm is cleared when the process can be connected.

Attribute

Alarm ID

Alarm Severity

Auto Clear

12007

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

The service provided by the process is unavailable.

Possible Causes

  • The instance process is abnormal.
  • The drive space is insufficient.

Procedure

  1. Check whether the instance process is abnormal.

    1. Go to the MRS cluster details page. In the alarm list on the alarm management tab page, click the row that contains the alarm. In the alarm details, view the host name and service name of the alarm.
    2. On the Alarms page, check whether the alarm ALM-12006 Node Fault (For MRS 2.x or Earlier) is generated.

      If yes, go to 1.c.

      If no, go to 1.d.

    3. Handle the alarm by following the instructions in ALM-12006 Node Fault (For MRS 2.x or Earlier).
    4. Check whether the installation directory user, user group, and permission of the alarm role are correct. The correct user, user group, and the permission are omm, ficommon, and 750, respectively.
      • If yes, go to 1.f.
      • If no, go to 1.e.
    5. Run the following commands to set the permission to 750 and User:Group to omm:ficommon:

      chmod 750 <folder_name>

      chown omm:ficommon <folder_name>

    6. Wait 5 minutes and check whether the ALM-12007 Process Fault alarm is cleared.
      • If yes, no further action is required.
      • If no, go to 2.a.

  2. Check whether the disk space is insufficient.

    1. On the MRS cluster details page, click the alarm management tab and check whether ALM-12017 Insufficient Disk Capacity is generated in the alarm list.
      • If yes, go to 2.b.
      • If no, go to 3.
    2. Handle the alarm by following the instructions in ALM-12017 Insufficient Disk Capacity (For MRS 2.x or Earlier).
    3. Wait 5 minutes and check whether the ALM-12017 Insufficient Disk Capacity alarm is cleared.

      If yes, go to 2.d.

      If no, go to 3.

    4. Wait 5 minutes and check whether the alarm is cleared.

      If yes, no further action is required.

      If no, go to 3.

  3. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Reference

None