Updated on 2022-11-18 GMT+08:00

Suggestions

Do not call the closeRegion method of Admin to close a Region

Admin interface provides an API to close a Region:

public void closeRegion(final String regionname, final String serverName)

When this method is used to close a Region, the HBase Client sends an RPC request to the RegionServer of the Region to be closed. The Master is unaware of the whole process. That is, the Master does not know even if the Region is closed. If the closeRegion method is called when the Master determines to migrate the Region based on the execution result of Balance, the Region cannot be closed or migrated. (In the current HBase version, this issue has not been resolved).

Therefore, do not call the closeRegion method of Admin to close a Region.

Write data in PutList mode

Table provides two data write interfaces:

  • public void put(final Put put) throws IOException
  • public void put(final List<Put> puts) throws IOException

The second one is recommended because it provides better performance than the first one.

Specify StartKey and EndKey for a Scan

A Scan with a specific range offers higher performance than a Scan without specific range.

Example:

Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("familyname"),Bytes.toBytes("columnname"));
scan.setStartRow( Bytes.toBytes("rowA")); // StartKey is rowA.
scan.setStopRow( Bytes.toBytes("rowB")); // EndKey is rowB.
for(Result result : demoTable.getScanner(scan)) {
// process Result instance
}

Do not disable WAL

Write-Ahead-Log (WAL) allows data to be written in a log file before being stored in the database.

WAL is enabled by default. The Put class provides an interface to disable WAL:

public void setWriteToWAL(boolean write)

If WAL is disabled (writeToWAL is set to False), data of the last 1s (The time can be specified by the hbase.regionserver.optionallogflushinterval parameter on the RegionServer. It is 1s by default) will be lost. WAL can be disabled only when high data write speed is required and data loss of the last 1s is allowed.

Set blockcache to true when creating a table or when Scan is performed

Set blockcache to true when a table is created or when Scan is performed on the HBase client. If there are a large number of repeated records, setting this parameter to true can improve efficiency.

By default, blockcache is true. Avoid setting this parameter to false forcibly, for example:

HColumnDescriptor fieldADesc = new HColumnDescriptor("value".getBytes());
fieldADesc.setBlockCacheEnabled(false);

The HBase does not support query by Orderby or with the search criteria specified. It is based on the lexicographic order and can only be read by Rowkey.

HBase should not be used in scenarios of random query and sequencing.

Suggestions on Services List Design

  1. Pre-allocate regions in a balanced manner in order to improve concurrency capabilities.
  2. Avoid excessive hotspot regions. Import the time factor to Rowkey if necessary.
  3. It is preferred that concurrently accessed data be stored continuously. Concurrently read data should be stored nearby, on the same row and in the same cell.
  4. Put frequently queried attributes property before Rowkey. Rowkey should be designed to match the main query criteria in terms of criterion sequencing.
  5. Attributes with high dispersions should be contained in RowKey. Design the services list based on data dispersion and query scenarios.
  6. Store redundant information to enhance indexing performance. Use secondary index to adapt to more query scenarios.
  7. Enable automatic deletion of expired data by setting the expiration time and version quantity.

In the HBase, Regions busy writing data are called hotspot Region.