Updated on 2022-09-14 GMT+08:00

Examples

Set Configuration Parameters

To set up a connection between an HBase client and an HBase server, you need to set the following parameters.

  • hbase.zookeeper.quorum: IP address of ZooKeeper. If there are multiple ZooKeeper nodes, separate their IP addresses with commas (,).
  • hbase.zookeeper.property.clientPort: Port of ZooKeeper.

The Configuration instance created using HBaseConfiguration.create() will automatically load configuration items in the following files.

  • core-default.xml
  • core-site.xml
  • hbase-default.xml
  • hbase-site.xml

Save these configuration files in Source Folder. To create a Source Folder, create the resource folder in the project, right-click the folder, and choose Build Path > Use as Source Folder.

The following table provides a set of parameters that can be configured on the client.

Generally, you are advised not to modify these values.

Parameter

Description

hbase.client.pause

Retry waiting time upon each exception or other situations (the actual waiting time is calculated based on the value of this parameter and the number of retries).

hbase.client.retries.number

Number of retry times in the case of exceptions or other cases.

hbase.client.retries.longer.multiplier

It is related to the number of retries.

hbase.client.rpc.maxattempts

Number of retries when an RPC request is unreachable.

hbase.regionserver.lease.period

It is related to the timeout interval of the scanner (unit: ms).

hbase.client.write.buffer

This parameter is invalid if AutoFlush is enabled. If AotoFlush is disabled, the HBase client caches the written data first, and delivers a write operation to the HBase cluster only when the cached data volume reaches the specified value.

hbase.client.scanner.caching

Number of lines allowed for the next request during a scan

hbase.client.keyvalue.maxsize

Maximum value of a key-value

hbase.htable.threads.max

Maximum number of threads related to data operations in an HTable instance

hbase.client.prefetch.limit

Before reading or writing data, the client must obtain a region address. The client can pre-cache some region addresses. This parameter is used to set the number of pre-cached region addresses.

Parameter setting method:

hbaseConfig = HBaseConfiguration.create();

// You do not need to set the following parameters if they are specified in the configuration files.

hbaseConfig.set("hbase.zookeeper.quorum", "10.5.100.1,10.5.100.2,10.5.100.3");

hbaseConfig.set("hbase.zookeeper.property.clientPort", "2181");

Use HTablePool in Multi-thread Write Operations

  1. If there are multiple data write threads, you can use HTablePool. The following describes how to use HTablePool and the precautions:
  2. Multiple data write threads must share the same HTablePool instance.

    When instantiating HTablePool, you need to specify maxSize, the maximum number of HTableInterface instances. That is, use the following constructor to instantiate the class:

    public HTablePool(final Configuration config, final int maxSize)

    The value of maxSize can be determined based on Threads (the number of data write threads) and Tables (the number of involved user tables). Theoretically, the value cannot exceed the result of Threads x Tables.

  3. The client thread obtains an HTableInterface instance whose table name is tableName by using the HTablePool#getTable(tableName) method.
  4. An HTableInterface instance can be used by only one thread at a time.
  5. If HTableInterface is not used, call HTablePool#putTable(HTableInterface table) to release it.
Sample Code:
/**
* Some retries are required after data writing fails. The waiting time for each retry depends on the number of retries. 
*/
private static final int[] RETRIES_WAITTIME = {1, 1, 1, 2, 2, 4, 4, 8, 16, 32};
/**
* Specify the number of retries.
*/
private static final int RETRIES = 10;
/**
* Basic waiting time unit.
*/
private static final int PAUSE_UNIT = 1000;
private static Configuration hadoopConfig;
private static HTablePool tablePool;
private static String[] tables;
/**
* <Initialize HTablePool>
* <Function description>
* @param config
* @see [class, class#method, class#member]
*/
public static void initTablePool()
{
DemoConfig config = DemoConfig.getInstance();
if (hadoopConfig == null)
{
hadoopConfig = HBaseConfiguration.create();
hadoopConfig.set("hbase.zookeeper.quorum", config.getZookeepers());
hadoopConfig.set("hbase.zookeeper.property.clientPort", config.getZookeeperPort());
}
if (tablePool == null)
{
tablePool = new HTablePool(hadoopConfig, config.getTablePoolMaxSize());
tables = config.getTables().split(",");
}
}
public void run()
{
// Initialize the HTablePool. This instance is shared by multiple threads and is instantiated only once.
initTablePool();
for (;;)
{
Map<String, Object> data = DataStorage.takeList();
String tableName = tables[(Integer)data.get("table")];
List<Put> list = (List)data.get("list");
// Use Row as the key to save all Put in the list. This set is used only to search for failed data records when data writing fails,
// because the server returns only the Row value of the failed data records.
Map<byte[], Put> rowPutMap = null;
// If data (even part of data) fails to be written, a retry is required. For each retry, only failed data items are submitted.
INNER_LOOP :
for (int retry = 0; retry < RETRIES; retry++)
{
// Obtain an HTableInterface instance from HTablePool and release the instance if it is not required.
HTableInterface table = tablePool.getTable(tableName);
try
{
table.put(list);
// If you can perform this step, the operation is successful.
break INNER_LOOP;
}
catch (IOException e)
{
// If the exception type is RetriesExhaustedWithDetailsException,
// the reasons why some of the data fails to be written often are 
// process errors in the HBase cluster 
// or migration of a large number of regions.
// If the exception type is not RetriesExhaustedWithDetailsException, you need to 
// insert all data in the list again.
if (e instanceof RetriesExhaustedWithDetailsException)
{
RetriesExhaustedWithDetailsException ree =
(RetriesExhaustedWithDetailsException)e;
int failures = ree.getNumExceptions();
System.out.println ("Failed to insert [" + failures + "] pieces of data.");
// Instantiate the Map when a retry is performed upon the first failure.
if (rowPutMap == null)
{
rowPutMap = new HashMap<byte[], Put>(failures);
for (int m = 0; m < list.size(); m++)
{
Put put = list.get(m);
rowPutMap.put(put.getRow(), put);
}
}
// Clear the original data and add the failed data.
list.clear();
for (int m = 0; m < failures; m++)
{
list.add(rowPutMap.get(ree.getRow(m)));
}
}
}
finally
{
// Release the instance after using it.
tablePool.putTable(table);
}
// If an exception occurs, wait some time after releasing the HTableInterface instance.
try
{
sleep(getWaitTime(retry));
}
catch (InterruptedException e1)
{
System.out.println("Interruped");
}
}
}
}

Create a Put Instance

HBase is a column-based database. A row of data may have multiple column families, and a column family may contain multiple columns. When writing data, you must specify the columns (including the column family names and column names) to which data is written.

To write a row of data into an HBase table, you need to create a Put instance first. The Put instance consists of the key and value of data. The value can contain multiple columns of values.

Note that the family, qualifier, and value added are byte arrays when a key-value is added to a Put instance. Use the Bytes.toBytes method to convert character strings to byte arrays. Do not use the String.toBytes method, because this method cannot ensure correct data encoding. Errors occur when the key or value contains Chinese characters.

Sample code:

//The column family name is privateInfo.
private final static byte[] FAMILY_PRIVATE = Bytes.toBytes("privateInfo");
//The privateInfo column family has two columns: name and address.
private final static byte[] COLUMN_NAME = Bytes.toBytes("name");
private final static byte[] COLUMN_ADDR = Bytes.toBytes("address");
/**
* <Create a Put instance.>
* <A put instance with one column family and two columns of data is created.>
* @param rowKey Key value
* @param name Person name
* @param address Address
* @return
* @see [class, class#method, class#member]
*/
public Put createPut(String rowKey, String name, String address)
{
Put put = new Put(Bytes.toBytes(rowKey));
put.add(FAMILY_PRIVATE, COLUMN_NAME, Bytes.toBytes(name));
        put.add(FAMILY_PRIVATE, COLUMN_ADDR, Bytes.toBytes(address));
return put;
}

Create an HBaseAdmin Instance

Sample code:

private Configuration demoConf = null;
private HBaseAdmin hbaseAdmin = null;
/**
* <Constructor>
* Import the instantiated Configuration instance.
*/
public HBaseAdminDemo(Configuration conf)
{
this.demoConf = conf;
try
{
// Instantiate HBaseAdmin.
hbaseAdmin = new HBaseAdmin(this.demoConf);
}
catch (MasterNotRunningException e)
{
e.printStackTrace();
}
catch (ZooKeeperConnectionException e)
{
e.printStackTrace();
}
}
/**
* <Example of using some methods>
* <For more information about methods, refer to the HBase API documentation.>.
* @throws IOException
* @throws ZooKeeperConnectionException
* @throws MasterNotRunningException
* @see [class, class#method, class#member]
*/
public void demo() throws MasterNotRunningException, ZooKeeperConnectionException, IOException
{
byte[] regionName = Bytes.toBytes("mrtest,jjj,1315449869513.fc41d70b84e9f6e91f9f01affdb06703.");
byte[] encodeName = Bytes.toBytes("fc41d70b84e9f6e91f9f01affdb06703");
// Reallocate a region.
hbaseAdmin.unassign(regionName, false);
// Actively initiate Balance.
hbaseAdmin.balancer();
// Move a region. The second parameter is HostName+StartCode of RegionServer, for example,
// host187.example.com,60020,1289493121758. If this parameter is set to null, the region will be randomly moved.
hbaseAdmin.move(encodeName, null);
// Check whether a table exists.
hbaseAdmin.tableExists("tableName");
// Check whether a table is activated.
hbaseAdmin.isTableEnabled("tableName");
}
/**
* <Method to quickly create a table>
* <Create an HTableDescriptor instance, which contains description of the HTable to be created. Create a column family, which is associated with the HColumnDescriptor instance. In this example, the column family name is "columnName".>
* @param tableName Table name
* @return
* @see [class, class#method, class#member]
*/
public boolean createTable(String tableName)
{
try {
if (hbaseAdmin.tableExists(tableName)) {
return false;
}
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
HColumnDescriptor fieldADesc = new HColumnDescriptor("columnName".getBytes());
fieldADesc.setBlocksize(640 * 1024);
tableDesc.addFamily(fieldADesc);
hbaseAdmin.createTable(tableDesc);
} catch (Exception e) {
e.printStackTrace();
return false;
}
return true;
}