Copying Data
Based on the regions of and network connectivity between the source cluster and destination cluster, data copy scenarios are classified as follows:
Same Region
If the source cluster and destination cluster are in the same region, follow instructions in Establishing a Data Transmission Channel to configure the network and set up a network transmission channel. Use the DistCp tool to run the following command to copy the HDFS, HBase, Hive data files and Hive metadata backup files from the source cluster to the destination cluster.
$HADOOP_HOME/bin/hadoop distcp <src> <dist> -p
The following provides description about the parameters in the preceding command.
- $HADOOP_HOME: installation directory of the Hadoop client in the destination cluster
- <src>: HDFS directory of the source cluster
- <dist>: HDFS directory of the destination cluster
Migrating Data from an Offline Cluster to a Cloud
You can use the following way to migrate data from an offline cluster to the cloud.
- Direct Connect
Create a Direct Connect between the source cluster and target cluster, enable the network between the offline cluster egress gateway and the online VPC, and use DistCp to copy the data by referring to Same Region.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.