迁移HBase索引数据

操作场景

MRS 1.7及其以后版本中使用的索引与以前MRS版本中HBase使用的二级索引都不兼容。因此，为了将索引数据从以前的版本（MRS 1.5及其以前版本）迁移到MRS 1.7及其以后版本，需要遵循以下步骤。

前提条件

迁移数据时旧版本集群应为MRS1.5及其以前的版本，新版本集群应为MRS1.7及其以后的版本。
迁移数据前用户应该有旧的索引数据。
安全集群需配置跨集群互信和启用集群间拷贝功能，普通集群仅需启用集群间拷贝功能。详情请参见配置跨集群互信。

操作步骤

把旧集群中的用户数据迁移至新集群中。迁移数据需单表手动同步新旧集群的数据，通过Export、distcp、Import来完成。

例如，当前旧集群有用户表（t1，索引名为idx_t1）及其对应的索引表（t1_idx）。迁移数据的操作步骤如下：

从旧集群导出表中数据。
```
hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.mapreduce.include.deleted.rows=true <tableName> <path/for/data>
```
- <tableName>：指的是表名。例如，t1。
- <path/for/data>：指的是保存源数据的路径，例如“/user/hbase/t1”。
例如，hbase org.apache.hadoop.hbase.mapreduce.Export -Dhbase.mapreduce.include.deleted.rows=true t1 /user/hbase/t1
把导出的数据按如下步骤复制到新集群中。
```
hadoop distcp <path/for/data> hdfs://ActiveNameNodeIP:9820/<path/for/newData>
```
- <path/for/data>：指的是旧集群保存源数据的路径。例如，/user/hbase/t1。
- <path/for/newData>：指的是新集群保存源数据的路径。例如，/user/hbase/t1。
其中，ActiveNameNodeIP是新集群中主NameNode节点的IP地址。

例如，hadoop distcp /user/hbase/t1 hdfs://192.168.40.2:9820/user/hbase/t1
- 可手动把导出的数据复制到新集群HDFS中，如上路径：“/user/hbase/t1”。
使用新集群HBase表用户，在新集群中生成HFiles。
```
hbase org.apache.hadoop.hbase.mapreduce.Import -Dimport.bulk.output=<path/for/hfiles> <tableName><path/for/newData>
```
- <path/for/hfiles>：指的是新集群生成HFiles的路径。例如，/user/hbase/output_t1。
- <tableName>：指的是表名。例如，t1。
- <path/for/newData>：指的是新集群保存源数据的路径。例如，/user/hbase/t1。
例如，

hbase org.apache.hadoop.hbase.mapreduce.Import -Dimport.bulk.output=/user/hbase/output_t1 t1 /user/hbase/t1
把生成的HFiles导入新集群相应表中。
命令如下：
```
 hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles <path/for/hfiles> <tableName>
```
- <path/for/hfiles>：指的是新集群生成HFiles的路径。例如，/user/hbase/output_t1。
- <tableName>：指的是表名。例如，t1。
例如，
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/output_t1 t1
1. 以上为迁移用户数据的过程，旧集群的索引数据迁移只需按照前三步操作，并更改相应表名为索引表名（如，t1_idx）。
2. 迁移索引数据时无需执行4。
向新集群表中导入索引数据。
1. 在新集群的用户表中添加与之前版本用户表相同的索引（名称为'd'的列族不应该已经存在于用户表中）。
  命令如下所示：
```
hbase org.apache.hadoop.hbase.hindex.mapreduce.TableIndexer -Dtablename.to.index=<tableName> -Dindexspecs.to.add=<indexspecs> 
```
  - -Dtablename.to.index=<tableName>：指的是表名。例如，-Dtablename.to.index=t1。
  - -Dindexspecs.to.add=<indexspecs>：指的是索引名与列的映射，例如-Dindexspecs.to.add='idx_t1=>info:[name->String]'。
  例如，
  
  hbase org.apache.hadoop.hbase.hindex.mapreduce.TableIndexer -Dtablename.to.index=t1 -Dindexspecs.to.add='idx_t1=>info:[name->String]'
  
  如果用户表中已经存在名称为'd'的ColumnFamily，则用户必须使用TableIndexer工具构建索引数据。
2. 运行LoadIncrementalHFiles工具加载索引数据，将旧集群索引数据加载到新集群表中。
  命令如下：
```
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles </path/for/hfiles> <tableName>
```
  - </path/for/hfiles>：指的是索引数据在HDFS上的路径（其为-Dimport.bulk.output中指定的索引生成路径）。例如，/user/hbase/output_t1_idx。
  - <tableName>：指的是新集群中表名，例如，t1。
  例如，
  
  hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hbase/output_t1_idx t1