Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)
Issue
A disk is not accessible.
Symptom
A user created an MRS cluster with local disks. A disk of a core node in this cluster is damaged, resulting in file read failures.
Cause Analysis
The disk hardware is faulty.
Procedure
This procedure is applicable to analysis clusters earlier than MRS 3.x. If you need to replace disks for a streaming cluster or hybrid cluster, contact Huawei Cloud technical support.
- Log in to MRS Manager.
- Choose Hosts, click the name of the target host, click RegionServer in the Roles list, click More, and select Decommission.
- Choose Hosts, click the name of the target host, click DataNode in the Roles list, click More, and select Decommission.
- Choose Hosts, click the name of the target host, click NodeManager in the Roles list, click More, and select Decommission.
If this host still runs other instances, perform the similar operation to decommission the instances.
- Run the vim /etc/fstab command to comment out the mount point of the faulty disk.
Figure 1 Commenting out the mount point of the faulty disk
- If the old disk is still accessible, migrate user data on the old disk (for example, /srv/BigData/hadoop/data1/).
- Log in to the MRS console.
- On the cluster details page, click the Nodes tab.
- Click the node whose disk is to be replaced to go to the ECS console. Click Stop to stop the node.
- Contact Huawei Cloud technical support to replace the disk in the background.
- On the ECS console, click Start to start the node where the disk has been replaced.
- Run the fdisk -l command to view the new disk.
- Run the cat /etc/fstab command to obtain the drive letter.
Figure 2 Obtaining the drive letter
- Use the corresponding drive letter to format the new disk.
Example: mkfs.ext4 /dev/sdh
- Run the following command to attach the new disk.
mount New disk Mount point
Example: mount /dev/sdh /srv/BigData/hadoop/data1
- Run the following command to grant the omm user permission to the new disk:
chown omm:wheel Mount point
Example: chown -R omm:wheel /srv/BigData/hadoop/data1
- Add the UUID of the new disk to the fstab file.
- Run the blkid command to check the UUID of the new disk.
- Open the /etc/fstab file and add the following information:
UUID=New disk UUID /srv/BigData/hadoop/data1 ext4 defaults,noatime,nodiratime 1 0
- Run the blkid command to check the UUID of the new disk.
- (Optional) Create a log directory.
mkdir -p /srv/BigData/Bigdata
chown omm:ficommon /srv/BigData/Bigdata
chmod 770 /srv/BigData/Bigdata
Run the following command to check whether symbolic links to Bigdata logs exist. If yes, skip this step.
ll /var/log
- Log in to MRS Manager.
- Choose Hosts, click the name of the target host, click RegionServer in the Roles list, click More, and select Recommission.
- Choose Hosts, click the name of the target host, click DataNode in the Roles list, click More, and select Recommission.
- Choose Hosts, click the name of the target host, click NodeManager in the Roles list, click More, and select Recommission.
If this host still runs other instances, perform the similar operation to recommission the instances.
- Choose Services > HDFS. In the HDFS Summary area on the Service Status page, check whether Missing Blocks is 0.
- If Missing Blocks is 0, no further action is required.
- If Missing Blocks is not 0, contact Huawei Cloud technical support.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot