Updated on 2023-05-29 GMT+08:00

Migrating Index Data

Scenario

The indexes used in MRS 1.7 or later are incompatible with secondary indexes used by HBase in earlier MRS versions. Therefore, you need to perform the following operations to migrate index data from an earlier version (MRS 1.5 or earlier) to MRS 1.7 or later.

Prerequisites

  1. During data migration, the cluster of the old version must be MRS 1.5 or earlier, and the cluster of the new version must be MRS 1.7 or later.
  2. Before data migration, you must have old index data.
  3. A cross-cluster mutual trust relationship must be configured and the inter-cluster replication function must be enabled for a security cluster. For a common cluster, only the inter-cluster replication function needs to be enabled. For details, see Configuring Cross-Cluster Mutual Trust Relationshipsand Enabling Cross-Cluster Copy.

Procedure

Migrate user data from an old cluster to a new cluster. To migrate data, you need to manually synchronize data of the old and new clusters in a single table by export, distcp, and import.

For example, the current old cluster has a user table (t1, index name: idx_t1) and its corresponding index table (t1_idx). Perform the following operations to migrate data.

  1. Export table data from the old cluster.
    hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.mapreduce.include.deleted.rows=true <tableName> <path/for/data>
    • <tableName>: Indicates a table name, for example, t1.
    • <path/for/data>: Indicates the path for storing source data, for example, /user/hbase/t1.

    Example: hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.mapreduce.include.deleted.rows=true t1 /user/hbase/t1

  2. Copy the exported data to the new cluster as follows:
    hadoop distcp <path/for/data> hdfs://ActiveNameNodeIP:9820/<path/for/newData>
    • <path/for/data>: Indicates the path for storing source data in the old cluster, for example, /user/hbase/t1.
    • <path/for/newData>: Indicates the path for storing source data in the new cluster, for example, /user/hbase/t1.

    ActiveNameNodeIP indicates the IP address of the active NameNode in the new cluster.

    Example: hadoop distcp /user/hbase/t1 hdfs://192.168.40.2:9820/user/hbase/t1

    • Manually copy the exported data to HDFS of the new cluster, for example, /user/hbase/t1.
  3. Use the HBase table user of the new cluster to generate HFiles in the new cluster.
    hbase org.apache.hadoop.hbase.mapreduce.Import -Dimport.bulk.output=<path/for/hfiles> <tableName><path/for/newData>
    • <path/for/hfiles>: Indicates the path of the HFiles generated in the new cluster, for example, /user/hbase/output_t1.
    • <tableName>: Indicates a table name, for example, t1.
    • <path/for/newData>: Indicates the path for storing source data in the new cluster, for example, /user/hbase/t1.

    Example:

    hbase org.apache.hadoop.hbase.mapreduce.Import -Dimport.bulk.output=/user/hbase/output_t1 t1 /user/hbase/t1

  4. Import the generated HFiles to the table in the new cluster.

    The command is as follows:

     hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles <path/for/hfiles> <tableName>
    • <path/for/hfiles>: Indicates the path of the HFiles generated in the new cluster, for example, /user/hbase/output_t1.
    • <tableName>: Indicates a table name, for example, t1.

    Example:

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/output_t1 t1
    1. The preceding shows the process of migrating user data. You only need to perform the first three steps to migrate the index data of the old cluster and change the corresponding table name to an index table name (for example, t1_idx).
    2. Skip 4 when migrating index data.
  5. Import index data to a table in the new cluster.
    1. Add an index the same as that of the user table of the previous version to the user table of the new cluster (the user table cannot contain a column family named d).

      The command is as follows:

      hbase org.apache.hadoop.hbase.hindex.mapreduce.TableIndexer -Dtablename.to.index=<tableName> -Dindexspecs.to.add=<indexspecs> 
      • -Dtablename.to.index=<tableName>: Indicates a table name, for example, -Dtablename.to.index=t1.
      • -Dindexspecs.to.add=<indexspecs>: Indicates the mapping between an index name and a column, for example, -Dindexspecs.to.add='idx_t1=>info:[name->String]'.

      Example:

      hbase org.apache.hadoop.hbase.hindex.mapreduce.TableIndexer -Dtablename.to.index=t1 -Dindexspecs.to.add='idx_t1=>info:[name->String]'

      If a column family named d exists in the user table, you must use the TableIndexer tool to build index data.

    2. Run the LoadIncrementalHFiles tool to load the index data of the old cluster to a table in the new cluster.

      The command is as follows:

      hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles </path/for/hfiles> <tableName>
      • </path/for/hfiles>: Indicates the path of index data on HDFS. The path is the index generation path specified in -Dimport.bulk.output, for example, /user/hbase/output_t1_idx.
      • <tableName>: Indicates a table name of the new cluster, for example, t1.

      Example:

      hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/output_t1_idx t1