HBase Enhanced Features

Global Indexes

Index data stored in an independent table: Index data is synchronized to an independent index table. The index table is distributed in an independent region to decouple user data tables and enhance region stability.
Index query link optimization: Index data is sorted by index field. Users only need to query one or more consecutive regions in the index table. User data that contains redundant columns in an index table can be closed on the index table during index query. You do not need to query the user table.

Off-Heap Cache

Caching data blocks off-heap ensures that data does not need to be copied to the heap memory during data reading. This reduces performance glitches to ensure query stability.

Tries Index

DataBlockIndex is optimized. Indexes can be converted into prefix trees to reduce the memory occupied by duplicate prefixes. In addition, LOUDS-Sparse is used to encode data into a linear structure. Memory space saved can be used to cache more data. This reduces data read I/Os and latency, improves the overall throughput. After optimization, memory consumption was reduced by 80%.

HBase Dual-Read

It is difficult to ensure 99.9% query stability in HBase storage due to reasons such as GC, network jitter, and bad disk sectors. The HBase dual-read feature is introduced to meet the requirements for random read of mass data. The HBase dual-read feature is based on the DR capability of the active and standby clusters. The probability that the two clusters generate glitches at the same time is far less than that of one cluster. The dual-cluster concurrent mode is used to ensure query stability. When a user initiates a query request, the HBase service of the two clusters is queried at the same time. If the active cluster does not return any result after a period of time (the maximum tolerable glitch time), the data of the cluster with the fastest response can be used.

HBase Cold and Hot Data Separation

Data is written based on timestamps. Hot data is typically accessed. Cold data is stored on OBS for lower storage costs. Hot data is stored on cloud disks for shorter query latency. Overall, this greatly reduces costs for storing cold data and improves hot data access performance.

Self-Healing from Hotspotting

Self-healing from HBase hotspotting is an automatic adjustment mechanism introduced to alleviate system performance deterioration caused by overload on some nodes due to uneven data access. HBase is a distributed key-value database. Regions are the smallest units of data management. Poorly designed tables or improperly planned row keys make requests directed at a few fixed regions. As a result, the service pressure is high on a single node, causing performance deterioration or even request failures. Therefore, the core capabilities below are required for self-healing from HBase hotspotting:

Monitoring the request traffic of each node and performing aggregation analysis to quickly identify heavily loaded nodes and hotspot regions for troubleshooting.
Automatically diagnosing based on the collected information and performing region splitting to balance the request pressure through automatic adjustment in hotspot areas.
Intelligently identifying hot row keys and offering the corresponding traffic limiting mechanisms to form a complete closed-loop solution for hot issues.

Parent topic: HBase

Previous topic: HBase Application Scenarios

Next topic: Doris