Updated on 2023-12-14 GMT+08:00

Enabling Cross-Cluster Replication

Scenario

DistCp is used to replicate the data stored in HDFS from a cluster to another cluster. DistCp depends on the cross-cluster replication function, which is disabled by default. You need to enable it for both clusters.

This section describes how to modify parameters on FusionInsight Manager to enable the cross-cluster replication function. After this function is enabled, you can create a backup task for backing up data to the remote HDFS (RemoteHDFS).

Impact on the System

Yarn needs to be restarted to enable the cross-cluster replication function and cannot be accessed during restart.

Prerequisites

  • The hadoop.rpc.protection parameter of HDFS in the two clusters for data replication must use the same data transmission mode. The default value is privacy, indicating encrypted transmission. The value authentication indicates that transmission is not encrypted.
  • For clusters in security mode, you need to configure mutual trust between clusters.

Procedure

  1. Log in to FusionInsight Manager of one of the two clusters.
  2. Choose Cluster > Services > Yarn and click Configurations then All Configurations.
  3. In the navigation pane, choose Yarn > Distcp.
  4. Modify dfs.namenode.rpc-address, set haclusterX.remotenn1 to the service IP address and RPC port of one NameNode instance of the peer cluster, and set haclusterX.remotenn2 to the service IP address and RPC port number of the other NameNode instance of the peer cluster.

    haclusterX.remotenn1 and haclusterX.remotenn2 do not distinguish active and standby NameNodes. The default NameNode RPC port is 8020 and cannot be modified on Manager.

    Examples of modified parameter values: 10.1.1.1:8020 and 10.1.1.2:8020.

    • If data of the current cluster needs to be backed up to the HDFS of multiple clusters, you can configure the corresponding NameNode RPC addresses to haclusterX1, haclusterX2, haclusterX3, and haclusterX4.

  5. Click Save. In the confirmation dialog box, click OK.
  6. Restart the Yarn service.
  7. Log in to FusionInsight Manager of the other cluster and repeat 2 to 6.