ALM-12014 Partition Lost

Description

The system checks the partition status every 60 seconds. This alarm is generated when the system detects that a partition to which service directories are mounted is lost (because the device is removed or goes offline, or the partition is deleted). The system checks the partition status periodically.

Attribute

Alarm ID	Alarm Severity	Auto Clear
12014	Major	Yes: MRS 3.3.0 and later, MRS 3.1.0.0.10/3.1.5.0.3 and later patch versions No: Versions earlier than MRS 3.3.0

Parameters

Name	Meaning
Source	Specifies the cluster or system for which the alarm is generated.
ServiceName	Specifies the service for which the alarm is generated.
RoleName	Specifies the role for which the alarm is generated.
HostName	Specifies the host for which the alarm is generated.
DirName	Specifies the directory for which the alarm is generated.
PartitionName	Specifies the device partition for which the alarm is generated.

Impact on the System

Data loss: The device partition is lost and the data stored in the partition is lost.
System breakdown: If the system disk is lost, the system deployed on the node cannot run properly. In some cases, the system may break down and cannot be started.
Service failure: Read and write jobs on the lost device partition fail to run or run slowly.
Service interruption: Customers may need time to restore data and systems, and services cannot be provided.
Security risk: Important data may be stolen or disclosed, which severely affects customer services.

Possible Causes

The hard disk is removed.
The hard disk is offline, or a bad sector exists on the hard disk.

Procedure

On FusionInsight Manager, click O&M > Alarm > Alarms, and click in the row where the alarm is located.
Obtain HostName, PartitionName and DirName from Location.
Check whether the disk of PartitionName on HostName is inserted to the correct server slot.
- If yes, go to 4.
- If no, go to 5.
Contact hardware engineers to remove the faulty disk.
Log in to the HostName node where an alarm is reported and check whether there is a line containing DirName in the /etc/fstab file as user root.
- If yes, go to 6.
- If no, go to 7.
Run the vi /etc/fstab command to edit the file and delete the line containing DirName.
Contact hardware engineers to insert a new disk. For details, see the hardware product document of the relevant model. If the faulty disk is in a RAID group, configure the RAID group. For details, see the configuration methods of the relevant RAID controller card.
Wait 20 to 30 minutes (The disk size determines the waiting time), and run the mount command to check whether the disk has been mounted to the DirName directory.
- If yes, go to 9 for MRS 3.3.0 and later, MRS 3.1.0.0.10/3.1.5.0.3 and later patch versions. For clusters earlier than MRS 3.3.0, manually clear the alarm. No further action is required.
- If no, go to 10.
Wait about 2 minute and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 10.

Collect fault information.

On the FusionInsight Manager, choose O&M > Log > Download.
Select the OmmServer from the Services drop-down list and click OK.
Set Start Date for log collection to 10 minutes ahead of the alarm generation time and End Date to 10 minutes behind the alarm generation time and click Download.
Contact the O&M personnel and send the collected log information.

Alarm Clearing

MRS 3.3.0 and later, MRS 3.1.0.0.10/3.1.5.0.3 and later patch versions: After the fault is rectified, the system automatically clears this alarm.

Versions earlier than MRS 3.3.0: After the fault is rectified, the system does not automatically clear this alarm, and you need to manually clear the alarm.