Help Center/ MapReduce Service/ User Guide/ MRS Cluster O&M/ MRS Cluster Alarm Handling Reference/ ALM-12014 Device Partition Lost (For MRS 2.x or Earlier)
Updated on 2024-09-23 GMT+08:00

ALM-12014 Device Partition Lost (For MRS 2.x or Earlier)

Description

This alarm is generated when the system detects that a partition to which service directories are mounted is lost (because the device is removed or goes offline, or the partition is deleted). The system checks the partition status periodically.

Attribute

Alarm ID

Alarm Severity

Auto Clear

12014

Major

  • Yes: MRS 1.9.3.10 and later patch versions
  • No: MRS 2.x and earlier versions

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

DirName

Specifies the directory for which the alarm is generated.

PartitionName

Specifies the device partition for which the alarm is generated.

Impact on the System

Service data fails to be written into the partition, and the service system runs abnormally.

Possible Causes

  • The disk is removed.
  • The disk is offline, or a bad sector exists on the disk.

Procedure

  1. Go to the MRS cluster details page and choose Alarms.
  2. In the real-time alarm list, click the row that contains the alarm.
  3. In the Alarm Details area, obtain the values of HostName, PartitionName, and DirName from Location.
  4. Check whether the disk corresponding to PartitionName on HostName is inserted to the correct server slot.

    • If yes, go to 5.
    • If no, go to 6.

  5. Contact hardware engineers to remove the faulty disk.
  6. Use PuTTY to log in to the HostName node where an alarm is reported and check whether there is a line containing DirName in the /etc/fstab file.

    • If yes, go to 7.
    • If no, go to 8.

  7. Run the vi /etc/fstab command to edit the file and delete the line containing DirName.
  8. Contact hardware engineers to insert a new disk. For details, see the hardware product document of the relevant model. If the faulty disk is in a RAID group, configure the RAID group. For details, see the configuration methods of the relevant RAID controller card.
  9. Wait 20 to 30 minutes (The disk size determines the waiting time), and run the mount command to check whether the disk has been mounted to the DirName directory.

    • If yes, perform 10 for MRS 1.9.3.10 or later. For other versions, clear the alarm. No further action is required.
    • If no, perform 11.

  10. Wait 2 minutes and check whether the alarm is automatically cleared.

    • If yes, no further action is required.
    • If no, perform 11.

  11. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Alarm Clearing

MRS 1.9.3.10 and later patch versions: After the fault is rectified, the system automatically clears the alarm.

MRS 2.x and earlier versions: After the fault is rectified, the system does not automatically clear the alarm. You need to clear the alarm.

Reference

None