Updated on 2024-08-16 GMT+08:00

Introduction to HBase Application Development

HBase

HBase is a column-based distributed storage system that features high reliability, performance, and scalability. HBase is designed to break through limitations of a relational database to process massive amounts of data.

Application scenarios of HBase have the following features:

  • Massive data processing (higher than the TB or PB level)
  • High throughput
  • Highly efficient random read of massive data
  • Excellent scalability
  • Capable of concurrently processing structured and unstructured data
  • Not all Atomicity, Consistency, Isolation, Durability (ACID) features supported by traditional relational databases are required.
  • HBase tables have the following features:
    • Large: One table contains hundreds of millions of rows and millions of columns.
    • Column-based: Storage and rights control is implemented based on columns (families), and columns (families) are independently retrieved.
    • Sparse: Null columns do not occupy storage space, so a table can be sparse.

API Types

You are advised to use Java to develop HBase applications, because HBase is developed based on Java and Java is concise, universal, and easy-to-understand.

HBase adopts the same Application Programming Interfaces (APIs) as those of Apache HBase. For details about the APIs, visit http://hbase.apache.org/apidocs/index.html.

Table 1 describes the functions that HBase can provide by invoking APIs.

Table 1 Functions provided by HBase APIs

Function

Description

Data CRUD function

Data creation, retrieve, update, and deletion

Advanced feature

Filter and coprocessor

Management function

Table and cluster management