Updated on 2022-09-14 GMT+08:00

Rules

Create a Configuration Instance

Call the Create() method of HBaseConfiguration to instantiate this class. Otherwise, the HBase configurations cannot be successfully loaded.

Correct example:

// This part is declared in the class member variable declaration.
private Configuration hbaseConfig = null;
// Instantiate this class using its constructor or initialization method.
hbaseConfig = HBaseConfiguration.create();

Incorrect example:

hbaseConfig = new Configuration();

Share a Configuration Instance

The HBase client obtains rights to interact with an HBase cluster by creating an HConnection between the HBase client and Zookeeper in code. Each HConnection has a Configuration instance. The created HConnection instances are cached. That is, if the HBase client needs to interact with an HBase cluster, the client sends a Configuration instance to the cluster. Then, the HBase client checks for an HConnection instance for the Configuration instance in the cache. If a match is found, the HConnection instance is returned. If no match is found, an HConnection instance will be created.

If the Configuration instance is frequently created, a lot of unnecessary HConnection instances will be created, making the number of connections to Zookeeper reach the upper limit.

It is recommended that the client codes share one Configuration instance.

Create an HTable Instance

The HTable class has the following constructors:

  1. public HTable(final String tableName)
  2. public HTable(final byte [] tableName)
  3. public HTable(Configuration conf, final byte [] tableName)
  4. public HTable(Configuration conf, final String tableName)
  5. public HTable(final byte[] tableName, final HConnection connection,

    final ExecutorService pool)

The fifth constructor is recommended to create HTable instances. The first two constructors are not recommended, because a Configuration instance will be automatically created for each HTable instance during instantiation. If a large number of HTable instances need to be instantiated, lots of unnecessary HConnections will be created. The third and fourth constructors are not recommended, because a thread pool or connection will be created for each instance, which eventually deteriorates performance.

Correct example:

private HTable table = null;
public initTable(Configuration config, byte[] tableName)
{
// The sharedConn and pool have been instantiated in advance. You are advised to share the same connection or pool.
// The method to initialize the HConnection is as follows:
// HConnection sharedConn =
// HConnectionManager.createConnection(this.config);
table = new HTable(config, tableName, sharedConn, pool);
}

Incorrect example:

private HTable table = null;
public initTable(String tableName)
{
table = new HTable(tableName);
}
public initTable(byte[] tableName)
{
table = new HTable(tableName);
}

Multiple Threads Are Not Allowed to Use the Same HTable Instance at the Same Time

HTable is a non-thread-safe class. If an HTable instance is used by multiple threads at the same time, exceptions will occur.

Cache an HTable Instance

Cache the HTable instance that will be frequently used by a thread for a long period of time. A cached instance, however, will not be necessarily used by a thread permanently. In special circumstances, you need to rebuild an HTable instance. See the next rule for details.

Correct example:

In this example, the HTable instance is cached by Map. This method applies when multiple threads and HTable instances are required. If an HTable instance is used by only one thread and the thread has only one HTable instance, Map does not need to be used. The method provided here is for reference only.

// The Map uses TableName as a key value to cache all instantiated HTable.
private Map<String, HTable> demoTables = new HashMap<String, HTable>();
// All HTable instances share the Configuration instance.
private Configuration demoConf = null;
/**
* <Initialize an HTable class.>
* <Function description>
* @param tableName
* @return
* @throws IOException
* @see [class, class#method, class#member]
*/
private HTable initNewTable(String tableName) throws IOException
{
return new HTable(demoConf, tableName);
}
/**
* <Obtain an HTable instance.>
* <Function description>
* @see [class, class#method, class#member]
*/
private HTable getTable(String tableName)
{
if (demoTables.containsKey(tableName))
{
return demoTables.get(tableName);
} else {
HTable table = null;
try
{
table = initNewTable(tableName);
demoTables.put(tableName, table);
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
return table;
}
}
/**
* <Write data.>
* < Multi-thread multi-HTable instance design optimization is not involved.
* The synchronization method is used, because the same HTable is non-thread safe. 
* Time can be used only in a thread that writes data.>
* @param dataList
* @param tableName
* @see [class, class#method, class#member]
*/
public void putData(List<Put> dataList, String tableName)
{
HTable table = getTable(tableName);
// Synchronization is not required if an HTable instance is not shared by multiple threads.
// Note that the HTable is non-thread safe.
synchronized (table)
{
try
{
table.put(dataList);
table.notifyAll();
}
catch (IOException e)
{
                // When IOE is captured the cached instance needs to be recreated.
try {
     // Disable the Connection.
       table.close();
                  // Create the instance again.
                  table = new HTable(this.config, "jeason");
} catch (IOException e1) {
// TODO
}
}
}
}

Incorrect example:

public void putDataIncorrect(List<Put> dataList, String tableName)
{
HTable table = null;
try
{
// Create an HTable instance each time when data is written.
table = new HTable(demoConf, tableName);
table.put(dataList);
}
catch (IOException e1)
{
// TODO Auto-generated catch block
e1.printStackTrace();
}
finally
{
table.close();
}
}

Handle an HTable Instance Data Write Exception

Although the previous rule advocates recreation of an HTable instance, it does not mean that a thread always uses the same HTable instance. When IOException is captured, an HTable instance needs to be recreated. The sample code is similar to the previous one.

Do not call the following methods unless necessary:

  • Configuration#clear:

    Do not call this method if a Configuration is used by an object or a thread. The Configuration#clear method clears all attributes loaded. If this method is called for a Configuration used by the HTable, all the parameters of this Configuration will be deleted from HTable. As a result, an exception occurs when HTable uses the Configuration the next time. Therefore, avoid calling this method each time you recreate an HTable instance. Call this method when all the threads need to exit.

    Therefore, do not invoke this method each time a HTable is instantiated. Invoke this method when all threads end.

  • HConnectionManager#deleteAllConnections:

    This method deletes all connections from the Connection set. As the HTable stores the links to the connections, the connections being used cannot be stopped after the HConnectionManager#deleteAllConnections method is called, which eventually causes information leakage. Therefore, this method is not recommended.

Process Data Failed to Be Written

Some data write operations may fail due to instant exceptions or process failures. Therefore, the data must be recorded so that it can be written to the HBase when the cluster is restored.

The HBase Client returns the data that fails to be written and does not automatically retry. It only tells the interface caller which data fails to be written. To prevent data loss, measures must be taken to temporarily save the data in a file or in memory.

Correct example:

private List<Row> errorList = new ArrayList<Row>();
/**
* < Use the PutList method to insert data.>
* < Synchronization is not required if the method is not called by multiple threads.>
* @param put a data record
* @throws IOException
* @see [class, class#method, class#member]
*/
public synchronized void putData(Put put)
{
// Temporarily cache data in the list.
dataList.add(put);
// Perform a Put operation when the size of dataList reaches PUT_LIST_SIZE.
if (dataList.size() >= PUT_LIST_SIZE)
{
try
{
demoTable.put(dataList);
}
catch (IOException e)
{
// If the exception type is RetriesExhaustedWithDetailsException,
// the reasons why some of the data fails to be written often are 
// process errors in the HBase cluster
// or migration of a large number of regions.
if (e instanceof RetriesExhaustedWithDetailsException)
{
RetriesExhaustedWithDetailsException ree = 
  (RetriesExhaustedWithDetailsException)e;
int failures = ree.getNumExceptions();
for (int i = 0; i < failures; i++)
{
errorList.add(ree.getRow(i));
}
}
}
dataList.clear();
}
}

Release Resources

Call the Close method to release resources when the ResultScanner and HTable instances are not required. To enable the Close method to be called, add the Close method to the finally block.

Correct example:

ResultScanner scanner = null;
try
{
scanner = demoTable.getScanner(s);
//Do Something here.
}
finally
{
scanner.close();
}

Incorrect example:

  1. The code does not call the scanner.close() method to release resources.
  2. The scanner.close() method is not placed in the finally block.
    ResultScanner scanner = null;
    scanner = demoTable.getScanner(s);
    //Do Something here.
    scanner.close();

Add a Fault-Tolerance Mechanism for Scan

Exceptions, such as lease expiration, may occur when Scan is performed. Retry operations need to be performed when exceptions occur.

Retry operations can be applied in HBase-related interface methods to improve fault tolerance capabilities.

Stop HBaseAdmin as soon as It Is Not Required

Stop HBaseAdmin as soon as possible. Do not cache the same HBaseAdmin instance for a long period of time.

Do not Use HTablePool to Obtain HTable Instances

The HTablePool implementation has risks of data leakage. Do not use HTablePool to obtain HTable instances. For details about how to create an HTable instance, see Multiple Threads Are Not Allowed to Use the Same HTable Instance at the Same Time.

Multithread Security Login Mode

If multiple threads are performing login operations, the relogin mode must be used for the subsequent logins of all threads after the first successful login of an application.

Login sample code:

  private Boolean login(Configuration conf){
    boolean flag = false;
    UserGroupInformation.setConfiguration(conf);
    
    try {
      UserGroupInformation.loginUserFromKeytab(conf.get(PRINCIPAL), conf.get(KEYTAB));
      System.out.println("UserGroupInformation.isLoginKeytabBased(): " +UserGroupInformation.isLoginKeytabBased());
      flag = true;
    } catch (IOException e) {
      e.printStackTrace();
    }
    return flag;
    
  }

Relogin sample code:

public Boolean relogin(){
        boolean flag = false;
        try {
            
          UserGroupInformation.getLoginUser().reloginFromKeytab();
          System.out.println("UserGroupInformation.isLoginKeytabBased(): " +UserGroupInformation.isLoginKeytabBased());
          flag = true;
        } catch (IOException e) {
            e.printStackTrace();
        }
        return flag;
    }