Updated on 2024-04-29 GMT+08:00

Creating a Table

Function Description

In HBase, a table is created using the createTable method of the org.apache.hadoop.hbase.client.Admin object. You need to specify a table name and a column family name. You can create a table by using either of the following methods, but the latter one is recommended:

  • Quickly create a table. A newly created table contains only one region, which will be automatically split into multiple new regions as data increases.
  • Create a table using pre-assigned regions. You can pre-assign multiple regions before creating a table. This mode accelerates data write at the beginning of massive data write.

The table name and column family name of a table consist of letters, digits, and underscores (_) but cannot contain any special characters.

Sample Code

public void testCreateTable() {
  LOG.info("Entering testCreateTable.");

  // Specify the table descriptor.
  HTableDescriptor htd = new HTableDescriptor(tableName); // (1)

  // Set the column family name to info.
  HColumnDescriptor hcd = new HColumnDescriptor("info"); // (2)

  // Set data encoding methods. HBase provides DIFF,FAST_DIFF,PREFIX
  // and PREFIX_TREE
  hcd.setDataBlockEncoding(DataBlockEncoding.FAST_DIFF);  // Note [1]

  // Set compression methods, HBase provides two default compression
  // methods:GZ and SNAPPY
  // GZ has the highest compression rate,but low compression and
  // decompression efficiency,fit for cold data
  // SNAPPY has low compression rate, but high compression and
  // decompression efficiency,fit for hot data.
  // it is advised to use SANPPY
  hcd.setCompressionType(Compression.Algorithm.SNAPPY); 
  htd.addFamily(hcd); // (3)

  Admin admin = null;
  try {
    // Instantiate an Admin object.
    admin = conn.getAdmin(); // (4)
    if (!admin.tableExists(tableName)) {
      LOG.info("Creating table...");
      admin.createTable(htd); // Note [2] (5)
      LOG.info(admin.getClusterStatus());
      LOG.info(admin.listNamespaceDescriptors());
      LOG.info("Table created successfully.");
    } else {
      LOG.warn("table already exists");
    }
  } catch (IOException e) {
    LOG.error("Create table failed.", e);
  } finally {
    if (admin != null) {
      try {
        // Close the Admin object.
        admin.close();
      } catch (IOException e) {
        LOG.error("Failed to close admin ", e);
      }
    }
  }
  LOG.info("Exiting testCreateTable.");
}

Explanation

(1) Create a table descriptor.

(2) Create a column family descriptor.

(3) Add the column family descriptor to the table descriptor.

(4) Obtain the Admin object. You use the Admin object to create a table and a column family, check whether the table exists, modify the table structure and column family structure, and delete the table.

(5) Invoke the Admin object to create a table.

Precautions

  • Note [1] Use the following code to set the compression mode for a column family:
    // Set an encoding algorithm. HBase provides four encoding algorithms: DIFF, FAST_DIFF, PREFIX, and PREFIX_TREE.
     hcd.setDataBlockEncoding(DataBlockEncoding.FAST_DIFF); 
      
    // Set a file compression mode. By default, HBase provides two compression algorithms: GZ and SNAPPY.
    // GZ has a high compression rate but low compression and decompression performance. It is applicable to cold data.
     // SNAPPY has a low compression rate but high compression and decompression performance. It is applicable to hot data.
    // It is recommended that SNAPPY compression be enabled by default.
     hcd.setCompressionType(Compression.Algorithm.SNAPPY);
  • Note [2] Create a table by specifying the start and end RowKeys or pre-assigning regions using RowKey arrays. The code snippet is as follows:
    // Create a table with pre-split regions.
     byte[][] splits = new byte[4][]; 
     splits[0] = Bytes.toBytes("A"); 
     splits[1] = Bytes.toBytes("H"); 
     splits[2] = Bytes.toBytes("O"); 
     splits[3] = Bytes.toBytes("U"); 
     admin.createTable(htd, splits);