Help Center > > User Guide> MRS Cluster Component Operation Guide> Using HBase> Using HIndex> Migrating Index Data

Migrating Index Data

Updated at: Nov 06, 2019 GMT+08:00

Scenarios

The indexes used in MRS 1.7 or later are incompatible with secondary indexes used by HBase in earlier MRS versions. Therefore, you need to perform the following operations to migrate index data from an earlier version (MRS 1.5 or earlier) to MRS 1.7 or later.

Prerequisites

1. During data migration, the cluster of the old version must be MRS 1.5 or earlier, and the cluster of the new version must be MRS 1.7 or later.

2. Before data migration, you must have old index data.

3. A cross-cluster mutual trust relationship must be configured and the inter-cluster replication function must be enabled for a security cluster. For a non-secure cluster, only the inter-cluster replication function needs to be enabled. For details, see Configuring Cross-Cluster Mutual Trust Relationships and Enabling the Cross-Cluster Copy Function.

Procedure

Migrate user data from an old cluster to a new cluster. To migrate data, you need to manually synchronize data of the old and new clusters in a single table by export, distcp, and import.

For example, the current old cluster has user table (t1, index name: idx_t1) and its corresponding index table (t1_idx). Perform the following operations to migrate data.

  1. Export table data from the old cluster.
    hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.mapreduce.include.deleted.rows=true <tableName> <path/for/data>
    • <tableName>: Indicates a table name, for example, t1.
    • <path/for/data>: Indicates the path for storing source data, for example, /user/hbase/t1.

    Example: hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.mapreduce.include.deleted.rows=true t1 /user/hbase/t1

  2. Copy the exported data to the new cluster as follows:
    hadoop distcp <path/for/data> hdfs://ActiveNameNodeIP:9820/<path/for/newData>
    • <path/for/data>: Indicates the path for storing source data in the old cluster, for example, /user/hbase/t1.
    • <path/for/newData>: Indicates the path for storing source data in the new cluster, for example, /user/hbase/t1.

    ActiveNameNodeIP indicates the IP address of the active NameNode in the new cluster,

    for example, hadoop distcp /user/hbase/t1 hdfs://192.168.40.2:9820/user/hbase/t1

    Manually copy the exported data to HDFS of the new cluster, for example, /user/hbase/t1.

  3. Use the HBase table user of the new cluster to generate HFiles in the new cluster.
     hbase org.apache.hadoop.hbase.mapreduce.Import -Dimport.bulk.output=<path/for/hfiles> <tableName> <path/for/newData>
    • <path/for/hfiles>: Indicates the path of the HFiles generated in the new cluster, for example, /user/hbase/output_t1.
    • <tableName>: Indicates a table name, for example, t1.
    • <path/for/newData>: Indicates the path for storing source data in the new cluster, for example, /user/hbase/t1.

    Example:

    hbase org.apache.hadoop.hbase.mapreduce.Import -Dimport.bulk.output=/user/hbase/output_t1 t1 /user/hbase/t1

  4. Import the generated HFiles to the table in the new cluster.

    The command is as follows:

     hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles <path/for/hfiles> <tableName> 
    • <path/for/hfiles>: Indicates the path of the HFiles generated in the new cluster, for example, /user/hbase/output_t1.
    • <tableName>: Indicates a table name, for example, t1.

    Example:

    hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/output_t1 t1

    1. The preceding shows the process of migrating user data. You only need to perform the first three steps to migrate the index data of the old cluster and change the corresponding table name to an index table name (for example, t1_idx).

    2. Skip 4 when migrating index data.

  5. Import index data to a table in the new cluster.
    1. Add the index as the same as the index of the user table of the previous version to the user table of the new cluster (the column family named 'd' must not exist in the user table).

      The command is as follows:

      hbase org.apache.hadoop.hbase.hindex.mapreduce.TableIndexer -Dtablename.to.index=<tableName> -Dindexspecs.to.add=<indexspecs> 
      • -Dtablename.to.index=<tableName>: Indicates a table name, for example, -Dtablename.to.index=t1.
      • -Dindexspecs.to.add=<indexspecs>: Indicates the mapping between an index name and a column, for example, -Dindexspecs.to.add='idx_t1=>info:[name->String]'.

      Example:

      hbase org.apache.hadoop.hbase.hindex.mapreduce.TableIndexer -Dtablename.to.index=t1 -Dindexspecs.to.add='idx_t1=>info:[name->String]'

      If a column family named d exists in the user table, you must use the TableIndexer tool to build index data.

    2. Run the LoadIncrementalHFiles tool to load the index data of the old cluster to a table in the new cluster.

      The command is as follows:

      hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles </path/for/hfiles> <tableName>
      • </path/for/hfiles>: Indicates the path of index data on HDFS. The path is the index generation path specified in -Dimport.bulk.output, for example, /user/hbase/output_t1_idx.
      • <tableName>: Indicates a table name of the new cluster, for example, t1.

      Example:

      hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/output_t1_idx t1

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel