Updated on 2024-04-29 GMT+08:00

Using CopyTable to Import Data

CopyTable is a utility provided by HBase. It can copy part or of all of a table, either to the same cluster or another cluster. The target table must exist first. The CloudTable client tool includes CopyTable. After deploying the client tool, you can use CopyTable to import data to a CloudTable cluster.

Using CopyTable to Import Data

  1. Prepare a Linux ECS as the client host and deploy the CloudTable client tool on it.

    For details, see Using HBase Shell to Access a Cluster.

    When deploying the client tool, set the ZK link to the access address (Intranet) of the CloudTable cluster where the source table resides.

  2. (Optional) If you want to copy a table to another cluster, obtain the access address (Intranet) of the target CloudTable cluster.

    Log in to the CloudTable console and choose Cluster Management. In the cluster list, locate the required cluster and obtain the address in the Access Address (Intranet) column.

  3. Before using CopyTable to copy table data, ensure that the target table exists in the target CloudTable cluster. If the target table does not exist, create it first.

    For details about how to create a table, see Creating an HBase Cluster.

  4. On the client host, open the CLI, access the hbase directory in the installation directory of the client tool, and run the CopyTable command to import data to the CloudTable cluster.

    The following is an example of the command. In this example, the data in the specified 1 hour in TestTable is copied to the target cluster.

    cd ${Installation directory of the client tool}/hbase
    ./bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=${ZK link of the target CloudTable cluster}:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable

Overview of the CopyTable Command

The CopyTable command format is as follows:

CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

For details about the CopyTable command, see CopyTable.

The following provides description about common options:

  • startrow: the start row
  • stoprow: the stop row
  • starttime: beginning of the time range (unixtime in milliseconds). If endtime is not specified, it implies that the duration extends from the start time indefinitely.
  • endtime: end of the time range. If no starttime is specified, ignore it.
  • versions: number of cell versions to be copied
  • new.name: name of a new table
  • peer.adr: Address of the target cluster. The format is hbase.zookeer.quorum:hbase.zookeeper.client.port:zookeeper.znode.paren. For the HBase clusters, the parameter value is ${ZK link of the target CloudTable cluster}:/hbase.
  • families: List of column families to be copied. Multiple column families are separated by commas (,).

    If you want to copy from sourceCfName to destCfName, specify sourceCfName:destCfName.

    If the column family name needs to remain unchanged after copying, you only need to specify cfName.

  • all.cells: Deletion markers and the deleted cells are also copied.

The parameter description is as follows:

tablename: name of the table to be copied