Help Center/ MapReduce Service/ Troubleshooting/ Cluster Management/ Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)
Updated on 2024-12-09 GMT+08:00

Replacing a Disk in an MRS Cluster (Applicable to 2.x and Earlier)

Issue

A disk is not accessible.

Symptom

A user created an MRS cluster with local disks. A disk of a core node in this cluster is damaged, resulting in file read failures.

Cause Analysis

The disk hardware is faulty.

Procedure

This procedure is applicable to analysis clusters earlier than MRS 3.x. If you need to replace disks for a streaming cluster or hybrid cluster, contact Huawei Cloud technical support.

  1. Log in to MRS Manager.
  2. Choose Hosts, click the name of the target host, click RegionServer in the Roles list, click More, and select Decommission.
  3. Choose Hosts, click the name of the target host, click DataNode in the Roles list, click More, and select Decommission.
  4. Choose Hosts, click the name of the target host, click NodeManager in the Roles list, click More, and select Decommission.

    If this host still runs other instances, perform the similar operation to decommission the instances.

  5. Run the vim /etc/fstab command to comment out the mount point of the faulty disk.

    Figure 1 Commenting out the mount point of the faulty disk

  6. If the old disk is still accessible, migrate user data on the old disk (for example, /srv/BigData/hadoop/data1/).
  7. Log in to the MRS console.
  8. On the cluster details page, click the Nodes tab.
  9. Click the node whose disk is to be replaced to go to the ECS console. Click Stop to stop the node.
  10. Contact Huawei Cloud technical support to replace the disk in the background.
  11. On the ECS console, click Start to start the node where the disk has been replaced.
  12. Run the fdisk -l command to view the new disk.
  13. Run the cat /etc/fstab command to obtain the drive letter.

    Figure 2 Obtaining the drive letter

  14. Use the corresponding drive letter to format the new disk.

    Example: mkfs.ext4 /dev/sdh

  15. Run the following command to attach the new disk.

    mount New disk Mount point

    Example: mount /dev/sdh /srv/BigData/hadoop/data1

  16. Run the following command to grant the omm user permission to the new disk:

    chown omm:wheel Mount point

    Example: chown -R omm:wheel /srv/BigData/hadoop/data1

  17. Add the UUID of the new disk to the fstab file.

    1. Run the blkid command to check the UUID of the new disk.

    2. Open the /etc/fstab file and add the following information:
      UUID=New disk UUID /srv/BigData/hadoop/data1 ext4 defaults,noatime,nodiratime 1 0

  18. (Optional) Create a log directory.

    mkdir -p /srv/BigData/Bigdata

    chown omm:ficommon /srv/BigData/Bigdata

    chmod 770 /srv/BigData/Bigdata

    Run the following command to check whether symbolic links to Bigdata logs exist. If yes, skip this step.

    ll /var/log

  19. Log in to MRS Manager.
  20. Choose Hosts, click the name of the target host, click RegionServer in the Roles list, click More, and select Recommission.
  21. Choose Hosts, click the name of the target host, click DataNode in the Roles list, click More, and select Recommission.
  22. Choose Hosts, click the name of the target host, click NodeManager in the Roles list, click More, and select Recommission.

    If this host still runs other instances, perform the similar operation to recommission the instances.

  23. Choose Services > HDFS. In the HDFS Summary area on the Service Status page, check whether Missing Blocks is 0.

    • If Missing Blocks is 0, no further action is required.
    • If Missing Blocks is not 0, contact Huawei Cloud technical support.