Updated on 2025-11-07 GMT+08:00

HBase Basic Principles

HBase Overview

HBase is a column-oriented distributed cloud storage system that features enhanced reliability, excellent performance, and elastic scalability. It applies to the storage of massive amounts of data and distributed computing. You can use HBase to build a storage system capable of storing terabytes to petabytes of data. With HBase, you can filter and analyze data with ease and get responses in milliseconds, rapidly mining data value.

HBase applies to the following scenarios:

  • Mass data storage

    HBase applies to TB- or even PB-level data storage and provides dynamic scaling capabilities so that you can adjust cluster resources to meet specific performance or capacity requirements.

  • Real-time query

    The columnar and key-value storage models apply to the ad-hoc query of enterprise user details. The primary key–based low-latency point query reduces the response latency to seconds or even milliseconds, facilitating real-time data analysis.

For details about HBase architecture and principles, visit https://hbase.apache.org/book.html.

HBase Principles

HBase is suitable for storing PB-level structured and semi-structured data, which may experience irregular explosive growth in a short period of time. HBase is a data storage engine designed for massive datasets and provides powerful key-value query capabilities. It supports tens of millions of concurrent throughput and millisecond-level access latency, meeting the service requirements of Internet enterprises for BI reports, online monitoring, and interactive analysis.

Figure 1 HBase architecture
Table 1 Modules in the HBase architecture

Item

Description

HBase real-time query engine

HBase is a distributed, column-oriented database with high reliability and linear scalability. HBase is built on top of the Hadoop Distributed File System (HDFS) to store massive amounts of data. It provides millisecond-level point query and range scan performance through a tree structure and sharding mechanism.

WAL + LSM Tree

The Log-Structured Merge (LSM) Tree, in conjunction with Write-Ahead Log (WAL), constitutes the fundamental storage architecture enabling high-performance writes in HBase. The system design optimizes the throughput of massive data. When data is written, it is first buffered in an in-memory structure called MemStore. Once the MemStore reaches a certain size threshold, data is sequentially written to HFiles in batches. This prevents frequent random input and output of small files.

LruBlockCache

HBase utilizes the LruBlockCache as its default in-memory caching mechanism. It manages HFile data blocks (such as index blocks and data blocks) in memory. The LruBlockCache categorizes blocks into three priority levels: SINGLE, MULTI, and MEMORY. This mechanism optimizes read performance and minimizes disk I/O.

HDFS

HDFS is a distributed file storage system of HBase. It provides high reliability, high performance, column storage, scalability, and real-time reading and writing.

OBS

Object Storage Service (OBS) provides secure, reliable, high-performance, and low-cost object storage for you to store massive amounts of data.

WAL

Write-Ahead Logging (WAL) ensures user data security in the event of a RegionServer crash.

HFile

HFile defines the storage format of StoreFiles in a file system. HFile is the underlying implementation of StoreFile.

Advantages

  • Native HBase APIs: CloudTable HBase is designed to be compatible with native HBase APIs, ensuring high availability of the architecture through the separation of compute and storage for enhanced reliability, along with in-depth kernel optimization.
  • Ease of use: Secondary indexes are supported to meet non-primary key query requirements.
  • Low costs: Cold and hot data can be segregated to fulfill the needs of data archiving and the storage of historical data with infrequent access, thereby minimizing storage expenses.
  • Stability and Reliability: CloudTable HBase provides stable and reliable performance through hotspot diagnosis and self-healing mechanism.
  • Visualized monitoring and O&M: CloudTable HBase offers visualized monitoring and user-defined alarm rules, simplifying system operation and maintenance.
  • High compatibility: CloudTable HBase is compatible with native HBase APIs. The NoSQL engine supports typical interface protocols in the industry.
  • SLA assurance: Stable TPS and latency are achieved.
  • High availability: With HMaster HA and region transfer within seconds, read and write operations are not affected even if there are faulty disks. The dual-read mechanism helps achieve more stable service level agreement (SLA).