Updated on 2024-09-23 GMT+08:00

Enabling MRS Inter-Cluster Replication

DistCp is used to replicate the data stored in HDFS from a cluster to another cluster. To use DistCp, you must first enable inter-cluster replication on both nodes of the cluster where you want to copy data.

Administrators can modify parameters on Manager to enable inter-cluster replication. They then create a backup task to copy data to the remote HDFS.

Impact on the System

Yarn needs to be restarted to enable the cross-cluster replication function and cannot be accessed during restart.

Prerequisites

  • The hadoop.rpc.protection parameter of HDFS in the two clusters for data replication must use the same data transmission mode. The default value is privacy, indicating encrypted transmission. The value authentication indicates that transmission is not encrypted.
  • For clusters with Kerberos authentication enabled (security mode), mutual trust between clusters needs to be configured.
  • The inbound rules of the two security groups on the peer cluster have been added to the two security groups in each cluster to allow all access requests of all protocols and ports of all ECSs in the security groups.

Enabling MRS Inter-Cluster Replication

  1. Log in to the Manager of one of the two clusters.

    • For MRS 2.x and earlier, choose Services > Yarn > Service Configuration and set Type to All.
    • For MRS 3.x and later, choose Cluster > Services > Yarn > Configurations, and click All Configurations.

  2. In the navigation pane on the left, choose Yarn > Distcp and set the following parameters:

    • For MRS2.x and earlier versions, set dfs.namenode.rpc-address.haclusterX.remotenn1 to the service IP address and RPC port of a NameNode instance in the peer cluster, in dfs.namenode.rpc-address.haclusterX.remotenn2, enter the service IP address and RPC port number of the other NameNode instance in the peer cluster. For example, enter 10.1.1.1:25000 and 10.1.1.2:25000.

      dfs.namenode.rpc-address.haclusterX.remotenn1 and dfs.namenode.rpc-address.haclusterX.remotenn2 do not distinguish active and standby NameNode instances. The default NameNode RPC port is 25000 and cannot be modified on Manager.

    • For MRS 3.x and later versions, modify dfs.namenode.rpc-address, set haclusterX.remotenn1 to the service IP address and RPC port of one NameNode instance of the peer cluster, and set haclusterX.remotenn2 to the service IP address and RPC port number of the other NameNode instance of the peer cluster. Examples of modified parameter values: 10.1.1.1:8020 and 10.1.1.2:8020.

      haclusterX.remotenn1 and haclusterX.remotenn2 do not distinguish active and standby NameNodes. The default NameNode RPC port is 8020 and cannot be modified on Manager.

      If data of the current cluster needs to be backed up to the HDFS of multiple clusters, you can configure the corresponding NameNode RPC addresses to haclusterX1, haclusterX2, haclusterX3, and haclusterX4.

  3. Save the configurations and restart YARN.

    • For MRS2.x and earlier versions, click Save Configuration and select Restart the affected services or instances. Click OK to restart the YARN service.

      After the system displays "Operation succeeded", click Finish. The YARN service is restarted successfully.

    • For MRS 3.x and later versions, click Save. In the displayed dialog box, click OK. Click Dashboard, choose More > Restart Service, and enter the password of the user to restart the YARN service.

  4. Log in to Manager of the other cluster and repeat 1 to 3.