Help Center/ GeminiDB/ GeminiDB Cassandra API/ HBase-Compatible Instance/ How Do I Set Pre-partition Keys When Creating a Table on a GeminiDB HBase Instance?
Updated on 2024-12-02 GMT+08:00

How Do I Set Pre-partition Keys When Creating a Table on a GeminiDB HBase Instance?

This section describes how to set a pre-partition key when creating a table on a GeminiDB HBase instance.

What Is Pre-partitioning

On a GeminiDB HBase instance, data is stored in different data partitions. Row key prefixes uniquely identify entities within each partition. By evenly distributing the data across partitions, workloads can be distributed evenly, so that cluster resources can be efficiently used.

For example, if two pre-partition keys are set to [1111, 2222] during table creation, data is divided into three ranges. The partitions to which the data belongs are divided based on the lexicographic order of row keys and partition keys. If rowkey < '1111' is specified, data is stored in the first partition. If '1111' <= rowkey < '2222' is specified, data is stored in the second partition. If rowkey >='2222' is specified, data is stored in the third partition. Ideally, the three partitions belong to different nodes. If partition keys are not properly set, partitions may belong to one cluster node.

Designing Pre-partition Keys

Theoretically, customer's application data can be evenly distributed by prefix in each partition. On a GeminiDB HBase instance, the ideal data volume in a partition is about 100 GB. There is no upper limit on the data volume in a single partition. If there is more than 100 GB of data in a partition, the data will be automatically partitioned. You can choose Service Tickets > Create Service Ticket in the upper right corner of the console to disable automated partitioning.

  • Example 1:

If the first digit of row key values are evenly distributed from 0 to 9, 10 partition keys can be set: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. Values starting with these digits belong to their own partitions.

  • Example 2:

If the first two digits of row key values are evenly distributed from 00 to FF and the estimated data volume in each partition is about 100 GB, 256 partition keys are recommended: [00, 01, 02, ..., FD, FE, FF].

Specifying Pre-partitions During Table Creation

On a GeminiDB HBase instance, HBase Shell or Java code can be used to specify pre-partitions during table creation.

  • Specify pre-partitions using HBase Shell when creating a table.
create 'tb','cf1','cf2', 'cf3', SPLITS => ['1111', '2222', '3333']

You can replace '1111', '2222', and '3333' with other custom partition key values. Use commas (,) to separate multiple values.

  • Specify pre-partitions using Java code when creating a table.
import java.util.ArrayList;
import java.util.List;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptor;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.TableDescriptor;
import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
 
public class ExampleCreateTable
{
    public static void main(String[] args) throws Throwable
    {
        // Create HBase configuration
        Configuration hbaseConfig = HBaseConfiguration.create();
        hbaseConfig.set("hbase.zookeeper.quorum", AllTestsSuite.instance.addr);
        hbaseConfig.set("hbase.zookeeper.property.clientPort", AllTestsSuite.instance.zk_port);
 
        TableName tableName = TableName.valueOf("default", "tb1");
 
        try (Connection connection = ConnectionFactory.createConnection(hbaseConfig))
        {
            // provide your split key here
            byte[][] splitkey = new byte[][]{ "rowkey1".getBytes(), "rowkey2".getBytes()};
 
 
            // 5 column families
            List<ColumnFamilyDescriptor> cfs = new ArrayList<>();
            cfs.add(ColumnFamilyDescriptorBuilder.newBuilder("cf1".getBytes()).build());
            cfs.add(ColumnFamilyDescriptorBuilder.newBuilder("cf2".getBytes()).build());
            cfs.add(ColumnFamilyDescriptorBuilder.newBuilder("cf3".getBytes()).build());
            cfs.add(ColumnFamilyDescriptorBuilder.newBuilder("cf4".getBytes()).build());
            cfs.add(ColumnFamilyDescriptorBuilder.newBuilder("cf5".getBytes()).build());
            TableDescriptor tableDescriptor = TableDescriptorBuilder.newBuilder(TestBase.tableName).setColumnFamilies(cfs).build();
            // create table
            TestBase.createTable(tableDescriptor, splitkey);
        }
    }
}