Updated on 2024-05-29 GMT+08:00

Importing and Exporting Table/Partition Data in Hive

Scenario

In big data application scenarios, data tables in Hive usually need to be migrated to another cluster. You can run the Hive import and export commands to migrate data in tables. That is, you can run the export command to export Hive tables from the source cluster to the HDFS of the target cluster, run the import command in the target cluster to import the exported data to the corresponding Hive table.

This section applies to MRS 3.2.0 or later.

The Hive table import and export function does not support importing or exporting encrypted tables, HBase external tables, Hudi tables, view tables, and materialized view tables.

Prerequisites

  • If Hive tables or partition data is imported or exported across clusters and Kerberos authentication is enabled for both the source and destination clusters, configure cross-cluster mutual trust.
  • If you want to run the import or export command to import or export tables or partitions created by other users, grant the corresponding table permission to the users.
  • Enable the inter-cluster copy function in the source cluster and destination cluster.
  • Configure the HDFS service address parameter for the source cluster to access the destination cluster.

    Log in to FusionInsight Manager of the source cluster, click Cluster, choose Services > Hive, and click Configuration. On the displayed page, search for hdfs.site.customized.configs, add custom parameter dfs.namenode.rpc-address.haclusterX, and set its value to Service IP address of the active NameNode instance node in the destination cluster:RPC port. Add custom parameter dfs.namenode.rpc-address.haclusterX1 and set its value to Service IP address of the standby NameNode instance node in the destination cluster:RPC port. The RPC port of NameNode is 25000 by default. After saving the configuration, roll-restart the Hive service.

Procedure

  1. Log in to the node where the client is installed in the destination cluster as the Hive client installation user.
  2. Run the following command to switch to the client installation directory, for example, /opt/client:

    cd /opt/client

  3. Run the following command to configure environment variables:

    source bigdata_env

  4. If Kerberos authentication is enabled for the cluster, run the following command to authenticate the user. Otherwise, skip this step.

    kinit Hive service user

  5. Run the following command to log in to the Hive client in the destination cluster:

    beeline

  6. Run the following command to create the export_test table:

    create table export_test(id int) ;

  7. Run the following command to insert data to the export_test table:

    insert into export_test values(123);

  8. Repeat 1 to 4 in the destination cluster and run the following command to create an HDFS path for storing the exported export_test table:

    dfs -mkdir /tmp/export

  9. Run the following command to log in to the Hive client:

    beeline

  10. Import and export the export_test table.

    The Hive import and export commands can be used to migrate table data in the following modes. Select a proper data migration mode as required.

    • Mode 1: Table export and import
      1. Run the following command in the source cluster to export the metadata and service data of the export_test table to the directory created in 8:

        export table export_test to 'hdfs://haclusterX/tmp/export';

      2. Run the following command in the destination cluster to import the table data exported in 10.a to the export_test table:

        import from '/tmp/export';

    • Mode 2: Renaming a table during the import
      1. Run the following command in the source cluster to export the metadata and service data of the export_test table to the directory created in 8:

        export table export_test to 'hdfs://haclusterX/tmp/export';

      2. Run the following command in the destination cluster to import the table data exported in 10.a to the import_test table:

        import table import_test from '/tmp/export';

    • Mode 3: Partition export and import
      1. Run the following commands in the source cluster to export the pt1 and pt2 partitions of the export_test table to the directory created in 8:

        export table export_test partition (pt1="in", pt2="ka") to 'hdfs://haclusterX/tmp/export';

      2. Run the following command in the destination cluster to import the table data exported in 10.a to the export_test table:

        import from '/tmp/export';

    • Mode 4: Exporting table data to a Partition
      1. Run the following command in the source cluster to export the metadata and service data of the export_test table to the directory created in 8:

        export table export_test to 'hdfs://haclusterX/tmp/export';

      2. Run the following command in the destination cluster to import the table data exported in 10.a to the pt1 and pt2 partitions of the import_test table:

        import table import_test partition (pt1="us", pt2="tn") from '/tmp/export';

    • Mode 5: Specifying the table location during the import
      1. Run the following command in the source cluster to export the metadata and service data of the export_test table to the directory created in 8:

        export table export_test to 'hdfs://haclusterX/tmp/export';

      2. Run the following command in the destination cluster to import the table data exported in 10.a to the import_test table and specify its location as tmp/export:

        import table import_test from '/tmp' location '/tmp/export';

    • Mode 6: Exporting data to an external table
      1. Run the following command in the source cluster to export the metadata and service data of the export_test table to the directory created in 8:

        export table export_test to 'hdfs://haclusterX/tmp/export';

      2. Run the following command in the destination cluster to import the table data exported in 10.a to external table import_test:

        import external table import_test from '/tmp/export';

    Before exporting table or partition data, ensure that the HDFS path for storage has been created and is empty. Otherwise, the export fails.

    When partitions are exported or imported, the exported or imported table must be a partitioned table, and data of multiple partition values of the same partition field cannot be exported.

    During the data import:
    • If the import from '/tmp/export'; statement is used to import a table, the table name is not specified, and the imported data is saved to the table path with the same name as the source table. Pay attention to the following points:
      • If there is no table with the same name as that in the source cluster in the destination cluster, such a table will be created during the table import.
      • Otherwise, the HDFS directory of the table must be empty, or the import fails.
    • If the import external table import_test from '/tmp/export'; statement is used to import a table, the exported table is imported to the specified table. Pay attention to the following points:
      • If there is no table with the same name as the specified table exists in the destination cluster, such a table will be created during the table import.
      • Otherwise, the HDFS directory of the table must be empty, or the import fails.

    hacluster X is the value of haclusterX in new custom parameterdfs.namenode.rpc-address.haclusterX.