Migrating HDFS Data to OBS
Scenarios
In the Huawei Cloud big data solution with decoupled storage and compute, OBS serves as a unified data lake to provide storage. If your data is still stored in local HDFS, migrate HDFS data to OBS first.
You can use any of the following methods to migrate data: DistCp or CDM.
Migration Using DistCp
Hadoop DistCp (abbreviation of distributed copy) is a tool used for large inter- or intra-Hadoop cluster copying. It uses MapReduce to implement file distribution, error handling and recovery, and reporting. It puts a list of files and directories as the input of map tasks, and each task will copy some files specified in the source list.
Configuration
Configure OBS by referring to the hadoop-huaweicloud installation and configuration in Connecting Hadoop to OBS.
Example
- View the files and directories in an HDFS directory (/data/sample as an example) to migrate:
hadoop fs -ls hdfs:///data/sample
- Migrate all files and directories inside /data/sample to the data/sample directory in OBS bucket obs-bigdata-posix-bucket:
hadoop distcp hdfs:///data/sample obs://obs-bigdata-posix-bucket/data/sample
- View the file copies:
hadoop fs -ls obs://obs-bigdata-posix-bucket/data/sample
Migration Using CDM
Cloud Data Migration (CDM) enables batch data migration among homogeneous and heterogeneous data sources, to realize flexible data flow. The data sources supported include relational databases, data warehouses, NoSQL, and big data cloud services.
For details, see What Is CDM?
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.