Updated on 2024-11-29 GMT+08:00

ALM-12014 Device Partition Lost

Alarm Description

This alarm is generated when the system detects that a partition to which service directories are mounted is lost (because the device is removed or offline, or the partition is deleted). The system checks the partition status every 60 seconds.

Alarm Attributes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

12014

Major

Physical resource

FusionInsight Manager

Yes (Versions earlier than MRS 3.3.0 do not support automatic clearance.)

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster or system for which the alarm was generated.

ServiceName

Specifies the service for which the alarm was generated.

RoleName

Specifies the role for which the alarm was generated.

HostName

Specifies the host for which the alarm was generated.

MountDirectoryName

Specifies the directory for which the alarm was generated.

PartitionName

Specifies the device partition for which the alarm was generated.

Additional Information

Details

Specifies alarm details.

Disk ESN

Specifies the serial number of the disk in the device partition for which the alarm was generated.

Impact on the System

  • Data loss: The device partition is lost and the data stored in the partition is lost.
  • System breakdown: If the system disk is lost, the system deployed on the node cannot run properly. In some cases, the system may break down and cannot be started.
  • Service failure: Read and write jobs on the lost device partition fail to run or run slowly.
  • Service interruption: Customers may need time to restore data and systems, and services cannot be provided.
  • Security risk: Important data may be stolen or disclosed, which severely affects customer services.

Possible Causes

  • The disk is removed.
  • The disk is offline, or a bad sector exists on the disk.

Handling Procedure

  1. Log in to FusionInsight Manager, choose O&M > Alarm > Alarms, and click in the row that contains the alarm.
  2. Obtain the HostName, PartitionName, and DirName from the Location area.
  3. Check whether the disk of PartitionName on HostName is inserted to the correct server slot.

    • If yes, go to 4.
    • If no, go to 5.

  4. Contact hardware engineers to remove the faulty disk.
  5. Log in to the host for which the alarm is generated as user root and check whether the /etc/fstab file has a row containing the directory name.

    • If yes, go to 6.
    • If no, go to 7.

  6. Run the vi /etc/fstab command to edit the file and delete the line containing the mounting directory name.
  7. Contact hardware engineers to insert a new disk. For details, see the hardware product document of the relevant model. If the faulty disk is in a RAID group, configure the RAID group. For details, see the configuration methods of the relevant RAID controller card.
  8. Wait 20 to 30 minutes (The disk size determines the waiting time), and run the mount command to check whether the disk has been mounted to the specified directory.

    • If yes, perform 9 for MRS 3.3.0 or later. For versions earlier than MRS 3.3.0, clear the alarm. No further action is required.
    • If no, go to 10.

  9. Wait 2 minutes and check whether the alarm is automatically cleared.

    • If yes, no further action is required.
    • If no, go to 10.

Collect fault information.

  1. On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Expand the Service drop-down list, select OmmServer for the target cluster, and click OK.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact O&M engineers and provide the collected logs.

Alarm Clearance

MRS 3.3.0 and later patch versions: After the fault is rectified, the system automatically clears the alarm.

MRS 3.3.0 and earlier versions: After the fault is rectified, the system does not automatically clear the alarm. You need to clear the alarm.

Related Information

None.