Using CopyTable to Import Data

CopyTable is a utility provided by HBase. It can copy part or of all of a table, either to the same cluster or another cluster. The target table must exist first. The CloudTable client tool includes CopyTable. After deploying the client tool, you can use CopyTable to import data to a CloudTable cluster.

Using CopyTable to Import Data

  1. Prepare a Linux ECS as the client host and deploy the CloudTable client tool on it.

    For details, see Using HBase Shell to Access a Cluster.

    When deploying the client tool, set the ZK link to the ZK link of the CloudTable cluster where the source table resides.

  2. (Optional) If you want to copy a table to another cluster, obtain the ZK link of the target CloudTable cluster.

    Log in to the CloudTable management console and choose Cluster Mode. In the cluster list, locate the required cluster and obtain its ZK link in the ZK Link column.

  3. Before using CopyTable to copy table data, ensure that the target table exists in the target CloudTable cluster. If the target table does not exist, create it first.

    For details about how to create a table, see Getting Started with HBase.

  4. On the client host, open the CLI, access the hbase directory in the installation directory of the client tool, and run the CopyTable command to import data to the CloudTable cluster.

    The following is an example of the command. In this example, the data in the specified 1 hour in TestTable is copied to the target cluster.

    cd ${Installation directory of the client tool}/hbase
    ./bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=${ZK link of the target CloudTable cluster}:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable

Overview of the CopyTable Command

The CopyTable command format is as follows:

CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

For details about the CopyTable command, see CopyTable.

The following provides description about common options:

  • startrow: the start row
  • stoprow: the stop row
  • starttime: beginning of the time range (unixtime in milliseconds). Without endtime means from the start time to forever.
  • endtime: end of the time range. If no starttime is specified, ignore it.
  • versions: number of cell versions to be copied
  • new.name: name of a new table
  • peer.adr: Address of the target cluster. The format is hbase.zookeeer.quorum:hbase.zookeeper.client.port:zookeeper.znode.paren. For the CloudTable clusters, the parameter value is ${ZK link of the target CloudTable cluster}:/hbase.
  • families: List of column families to be copied. Multiple column families are separated by commas (,).

    If you want to copy from sourceCfName to destCfName, specify sourceCfName:destCfName.

    If the column family name needs to remain unchanged after copying, you only need to specify cfName.

  • all.cells: Deletion markers and the deleted cells are also copied.

The parameter description is as follows:

tablename: name of the table to be copied