HBase Basic Principles
HBase Overview
HBase is a column-oriented distributed cloud storage system that features enhanced reliability, excellent performance, and elastic scalability. It applies to the storage of massive amounts of data and distributed computing. You can use HBase to build a storage system capable of storing terabytes to petabytes of data. With HBase, you can filter and analyze data with ease and get responses in milliseconds, rapidly mining data value.
HBase applies to the following scenarios:
- Mass data storage
HBase applies to TB- or even PB-level data storage and provides dynamic scaling capabilities so that you can adjust cluster resources to meet specific performance or capacity requirements.
- Real-time query
The columnar and key-value storage models apply to the ad-hoc query of enterprise user details. The primary key–based low-latency point query reduces the response latency to seconds or even milliseconds, facilitating real-time data analysis.
For details about HBase architecture and principles, visit https://hbase.apache.org/book.html.
HBase Principles
HBase is suitable for storing PB-level structured and semi-structured data, which may experience irregular explosive growth in a short period of time. HBase is a data storage engine designed for massive datasets and provides powerful key-value query capabilities. It supports tens of millions of concurrent throughput and millisecond-level access latency, meeting the service requirements of Internet enterprises for BI reports, online monitoring, and interactive analysis.
|
Item |
Description |
|---|---|
|
HBase real-time query engine |
HBase is a distributed, column-oriented database with high reliability and linear scalability. HBase is built on top of the Hadoop Distributed File System (HDFS) to store massive amounts of data. It provides millisecond-level point query and range scan performance through a tree structure and sharding mechanism. |
|
WAL + LSM Tree |
The Log-Structured Merge (LSM) Tree, in conjunction with Write-Ahead Log (WAL), constitutes the fundamental storage architecture enabling high-performance writes in HBase. The system design optimizes the throughput of massive data. When data is written, it is first buffered in an in-memory structure called MemStore. Once the MemStore reaches a certain size threshold, data is sequentially written to HFiles in batches. This prevents frequent random input and output of small files. |
|
LruBlockCache |
HBase utilizes the LruBlockCache as its default in-memory caching mechanism. It manages HFile data blocks (such as index blocks and data blocks) in memory. The LruBlockCache categorizes blocks into three priority levels: SINGLE, MULTI, and MEMORY. This mechanism optimizes read performance and minimizes disk I/O. |
|
HDFS |
HDFS is a distributed file storage system of HBase. It provides high reliability, high performance, column storage, scalability, and real-time reading and writing. |
|
OBS |
Object Storage Service (OBS) provides secure, reliable, high-performance, and low-cost object storage for you to store massive amounts of data. |
|
WAL |
Write-Ahead Logging (WAL) ensures user data security in the event of a RegionServer crash. |
|
HFile |
HFile defines the storage format of StoreFiles in a file system. HFile is the underlying implementation of StoreFile. |
Advantages
- Native HBase APIs: CloudTable HBase is designed to be compatible with native HBase APIs, ensuring high availability of the architecture through the separation of compute and storage for enhanced reliability, along with in-depth kernel optimization.
- Ease of use: Secondary indexes are supported to meet non-primary key query requirements.
- Low costs: Cold and hot data can be segregated to fulfill the needs of data archiving and the storage of historical data with infrequent access, thereby minimizing storage expenses.
- Stability and Reliability: CloudTable HBase provides stable and reliable performance through hotspot diagnosis and self-healing mechanism.
- Visualized monitoring and O&M: CloudTable HBase offers visualized monitoring and user-defined alarm rules, simplifying system operation and maintenance.
- High compatibility: CloudTable HBase is compatible with native HBase APIs. The NoSQL engine supports typical interface protocols in the industry.
- SLA assurance: Stable TPS and latency are achieved.
- High availability: With HMaster HA and region transfer within seconds, read and write operations are not affected even if there are faulty disks. The dual-read mechanism helps achieve more stable service level agreement (SLA).
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot