Help Center/ MapReduce Service/ Troubleshooting/ Cluster Management/ Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)

Updated on 2024-12-09 GMT+08:00

View PDF

Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)

Issue

A disk is not accessible.

Symptom

A user created an MRS cluster with local disks. A disk of a core node in this cluster is damaged, resulting in file read failures.

Cause Analysis

The disk hardware is faulty.

Procedure

This procedure is applicable to analysis clusters earlier than MRS 3.x. If you need to replace disks for a streaming cluster or hybrid cluster, contact Huawei Cloud technical support.

Log in to MRS Manager.
Choose Hosts, click the name of the target host, click RegionServer in the Roles list, click More, and select Decommission.
Choose Hosts, click the name of the target host, click DataNode in the Roles list, click More, and select Decommission.
Choose Hosts, click the name of the target host, click NodeManager in the Roles list, click More, and select Decommission.

If this host still runs other instances, perform the similar operation to decommission the instances.
Run the vim /etc/fstab command to comment out the mount point of the faulty disk.

Figure 1 Commenting out the mount point of the faulty disk
If the old disk is still accessible, migrate user data on the old disk (for example, /srv/BigData/hadoop/data1/).
Log in to the MRS console.
On the cluster details page, click the Nodes tab.
Click the node whose disk is to be replaced to go to the ECS console. Click Stop to stop the node.
Contact Huawei Cloud technical support to replace the disk in the background.
On the ECS console, click Start to start the node where the disk has been replaced.
Run the fdisk -l command to view the new disk.
Run the cat /etc/fstab command to obtain the drive letter.

Figure 2 Obtaining the drive letter
Use the corresponding drive letter to format the new disk.

Example: mkfs.ext4 /dev/sdh
Run the following command to attach the new disk.

mount New disk Mount point

Example: mount /dev/sdh /srv/BigData/hadoop/data1
Run the following command to grant the omm user permission to the new disk:

chown omm:wheel Mount point

Example: chown -R omm:wheel /srv/BigData/hadoop/data1
Add the UUID of the new disk to the fstab file.
1. Run the blkid command to check the UUID of the new disk.
2. Open the /etc/fstab file and add the following information:
```
UUID=New disk UUID /srv/BigData/hadoop/data1 ext4 defaults,noatime,nodiratime 1 0
```
(Optional) Create a log directory.

mkdir -p /srv/BigData/Bigdata

chown omm:ficommon /srv/BigData/Bigdata

chmod 770 /srv/BigData/Bigdata

Run the following command to check whether symbolic links to Bigdata logs exist. If yes, skip this step.

ll /var/log
Log in to MRS Manager.
Choose Hosts, click the name of the target host, click RegionServer in the Roles list, click More, and select Recommission.
Choose Hosts, click the name of the target host, click DataNode in the Roles list, click More, and select Recommission.
Choose Hosts, click the name of the target host, click NodeManager in the Roles list, click More, and select Recommission.

If this host still runs other instances, perform the similar operation to recommission the instances.
Choose Services > HDFS. In the HDFS Summary area on the Service Status page, check whether Missing Blocks is 0.
- If Missing Blocks is 0, no further action is required.
- If Missing Blocks is not 0, contact Huawei Cloud technical support.