Help Center/ MapReduce Service/ User Guide/ MRS Cluster O&M/ MRS Cluster Alarm Handling Reference/ ALM-50222 Disk Status of a Specified Data Directory on BE Is Abnormal
Updated on 2024-09-23 GMT+08:00

ALM-50222 Disk Status of a Specified Data Directory on BE Is Abnormal

Alarm Description

The system checks the disk status of a specified data directory on BE every 30 seconds. This alarm is generated when the disk status is not 1 (1 indicates the normal state and 0 indicates the abnormal state). This alarm is cleared when the disk status of the specified data directory on BE becomes normal.

Alarm Attributes

Alarm ID

Alarm Severity

Auto Cleared

50222

Critical

Yes

Alarm Parameters

Parameter

Description

Source

Specifies the cluster or system for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Trigger Condition

Specifies the threshold for triggering the alarm.

Impact on the System

Service data may be unavailable, and data queries on the Doris client may fail.

Possible Causes

  • The hard disk is faulty.
  • The disk permissions are set incorrectly.

Handling Procedure

Check whether a disk alarm is generated.

  1. On FusionInsight Manager, choose O&M > Alarm > Alarms and check whether ALM-12014 Partition Lost or ALM-12033 Slow Disk Fault exists.

    • If yes, go to 2.
    • If no, go to 4.

  2. Rectify the fault by referring to the handling procedure of ALM-12014 Partition Lost or ALM-12033 Slow Disk Fault. Then, check whether the alarm is cleared.

    • If yes, go to 3.
    • If no, go to 4.

  3. Wait 5 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 4.

Modify disk permissions.

  1. Choose O&M > Alarm > Alarms and view Location and Additional Information of the alarm to obtain the location of the faulty disk.
  2. Log in to the node for which the alarm is generated as user root. Go to the directory where the faulty disk is located, and run the ll command to check whether the permission for the faulty disk is 711 and whether the user is omm.

    • If yes, go to 7.
    • If no, go to 6.

  3. Modify the permission of the faulty disk. For example, if the faulty disk is data1, run the following commands:

    chown omm:wheel data1

    chmod 711 data1

  4. Choose Cluster > Services > Doris > Instances, select this BE instance, click More, and select Restart Instance. Wait 5 minutes and check whether an alarm is generated.

    • If no, no further action is required.
    • If yes, go to 8.

    During BE instance restart, the tasks running on BE nodes will fail. The tasks on BE nodes that are not restarted are not affected.

Collect fault information.

  1. On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Expand the Service drop-down list, and select Doris and OMS for the target cluster.
  3. Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 20 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact O&M personnel and provide the collected logs.

Alarm Clearance

This alarm is automatically cleared after the fault is rectified.

Related Information

None.