Updated on 2023-12-04 GMT+08:00

Exporting Metadata

To ensure that the data properties and permissions of the source cluster are consistent with those of the destination cluster, metadata of the source cluster needs to be exported to restore metadata after data migration. The metadata to be exported includes the owner, group, and permission information of the HDFS files and Hive table description.

Exporting HDFS Metadata

HDFS metadata information to be exported includes file and folder permissions and owner/group information. You can run the following command on the HDFS client to export the metadata:

$HADOOP_HOME/bin/hdfs dfs –ls –R <migrating_path> > /tmp/hdfs_meta.txt

The following provides description about the parameters in the preceding command.

  • $HADOOP_HOME: installation directory of the Hadoop client in the source cluster
  • <migrating_path>: HDFS data directory to be migrated
  • /tmp/hdfs_meta.txt: local path for storing the exported metadata

If the source cluster can communicate with the destination cluster and you run the hadoop distcp command as a super administrator to copy data, you can add the -p parameter to enable DistCp to restore the metadata of the corresponding file in the destination cluster while copying data. In this case, skip this step.

Exporting Hive Metadata

Hive table data is stored in HDFS. Table data and the metadata of the table data is centrally migrated in directories by HDFS in a unified manner. Metadata of Hive tables can be stored in different types of relational databases (such as MySQL, PostgreSQL, and Oracle) based on cluster configurations. The exported metadata of the Hive tables in this document is the Hive table description stored in the relational database.

The mainstream big data release editions in the industry support Sqoop installation. For on-premises big data clusters of the community version, you can download the Sqoop of the community version for installation. Use Sqoop to decouple the strong dependency between the metadata to be exported and the relational database and export Hive metadata to HDFS and migrate it together with the table data for restoration. The procedure is as follows:

  1. Download the Sqoop tool from the source cluster and install it. For details, see http://sqoop.apache.org/.
  2. Download the JDBC driver of the relational database to the ${Sqoop_Home}/lib directory.
  3. Run the following command to export all Hive metadata tables: All exported data is stored in the /user/<user_name>/<table_name> directory on HDFS.

    $Sqoop_Home/bin/sqoop import --connect jdbc:<driver_type>://<ip>:<port>/<database> --table <table_name> --username <user> -password <passwd> -m 1

    The following provides description about the parameters in the preceding command.

    • $Sqoop_Home: Sqoop installation directory
    • <driver_type>: Database type
    • <ip>: IP address of the database in the source cluster
    • <port>: Port number of the database in the source cluster
    • <table_name>: Name of the table to be exported
    • <user>: Username
    • <passwd>: User password

    Commands carrying authentication passwords pose security risks. Disable historical command recording before running such commands to prevent information leakage.