Halaman ini belum tersedia dalam bahasa lokal Anda. Kami berusaha keras untuk menambahkan lebih banyak versi bahasa. Terima kasih atas dukungan Anda.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Scalable File Service Turbo
Containers
Cloud Container Engine
Software Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page

HBase Basic Principles

Updated on 2023-11-07 GMT+08:00

HBase undertakes data storage. HBase is an open source, column-oriented, distributed storage system that is suitable for storing massive amounts of unstructured or semi-structured data. It features high reliability, high performance, and flexible scalability, and supports real-time data read/write. For more information about HBase, see https://hbase.apache.org/.

Typical features of a table stored in HBase are as follows:

  • Big table (BigTable): One table contains hundred millions of rows and millions of columns.
  • Column-oriented: Column-oriented storage, retrieval, and permission control
  • Sparse: Null columns in the table do not occupy any storage space.

MRS HBase supports secondary indexing to allow indexes to be created for column values so that data can be filtered by column using native HBase APIs.

HBase Architecture

An HBase cluster consists of active and standby HMaster processes and multiple RegionServer processes.

Figure 1 HBase architecture
Table 1 Module description

Module

Description

Master

Master is also called HMaster. In HA mode, HMaster consists of an active HMaster and a standby HMaster.

  • Active Master: manages RegionServer in HBase, including the creation, deletion, modification, and query of a table, balances the load of RegionServer, adjusts the distribution of Region, splits Region and distributes Region after it is split, and migrates Region after RegionServer expires.
  • Standby Master: takes over services when the active HMaster is faulty. The original active HMaster demotes to the standby HMaster after the fault is rectified.

Client

Client communicates with Master for management and with RegionServer for data protection by using the Remote Procedure Call (RPC) mechanism of HBase.

RegionServer

RegionServer provides read and write services of table data as a data processing and computing unit in HBase.

RegionServer is deployed with DataNodes of HDFS clusters to store data.

ZooKeeper cluster

ZooKeeper provides distributed coordination services for processes in HBase clusters. Each RegionServer is registered with ZooKeeper so that the active Master can obtain the health status of each RegionServer.

HDFS cluster

HDFS provides highly reliable file storage services for HBase. All HBase data is stored in the HDFS.

HBase Principles

  • HBase Data Model

    HBase stores data in tables, as shown in Figure 2. Data in a table is divided into multiple Regions, which are allocated by Master to RegionServers for management.

    Each Region contains data within a RowKey range. An HBase data table contains only one Region at first. As the number of data increases and reaches the upper limit of the Region capacity, the Region is split into two Regions. You can define the RowKey range of a Region when creating a table or define the Region size in the configuration file.

    Figure 2 HBase data model
    Table 2 Concepts

    Module

    Description

    RowKey

    Similar to the primary key in a relationship table, which is the unique ID of the data in each row. A RowKey can be a string, integer, or binary string. All records are stored after being sorted by RowKey.

    Timestamp

    The timestamp of a data operation. Data can be specified with different versions by time stamp. Data of different versions in each cell is stored by time in descending order.

    Cell

    Minimum storage unit of HBase, consisting of keys and values. A key consists of six fields, namely row, column family, column qualifier, timestamp, type, and MVCC version. Values are the binary data objects.

    Column Family

    One or multiple horizontal column families form a table. A column family can consist of multiple random columns. A column is a label under a column family, which can be added as required when data is written. The column family supports dynamic expansion so the number and type of columns do not need to be predefined. Columns of a table in HBase are sparsely distributed. The number and type of columns in different rows can be different. Each column family has the independent time to live (TTL). You can lock the row only. Operations on the row in a column family are the same as those on other rows.

    Column

    Similar to traditional databases, HBase tables also use columns to store data of the same type.

  • RegionServer Data Storage

    RegionServer manages the regions allocated by HMaster. Figure 3 shows the data storage structure of RegionServer.

    Figure 3 RegionServer data storage structure

    Table 3 lists each component of Region described in Figure 3.

    Table 3 Region structure description

    Module

    Description

    Store

    A Region consists of one or multiple Stores. Each Store maps a column family in Figure 2.

    MemStore

    A Store contains one MemStore. The MemStore caches data inserted to a Region by the client. When the MemStore capacity reaches the upper limit, RegionServer flushes data in MemStore to the HDFS.

    StoreFile

    The data flushed to the HDFS is stored as a StoreFile in the HDFS. As more data is inserted, multiple StoreFiles are generated in a Store. When the number of StoreFiles reaches the upper limit, RegionServer merges multiple StoreFiles into a big StoreFile.

    HFile

    HFile defines the storage format of StoreFiles in a file system. HFile is the underlying implementation of StoreFile.

    HLog

    HLogs prevent data loss when RegionServer is faulty. Multiple Regions in a RegionServer share the same HLog.

  • Metadata Table

    The metadata table is a special HBase table, which is used by the client to locate a region. Metadata table includes hbase:meta table to record region information of user tables, such as the region location and start and end RowKey.

    Figure 4 shows the mapping relationship between metadata tables and user tables.

    Figure 4 Mapping relationships between metadata tables and user tables
  • Data Operation Process
    Figure 5 shows the HBase data operation process.
    Figure 5 Data processing
    1. When you add, delete, modify, and query HBase data, the HBase client first connects to ZooKeeper to obtain information about the RegionServer where the hbase:meta table is located. If you modify the NameSpace, such as creating and deleting a table, you need to access HMaster to update the meta information.
    2. The HBase client connects to the RegionServer where the region of the hbase:meta table is located and obtains the RegionServer location where the region of the user table resides.
    3. Then the HBase client connects to the RegionServer where the region of the user table is located and issues a data operation command to the RegionServer. The RegionServer executes the command.

    To improve data processing efficiency, the HBase client caches region information of the hbase:meta table and user table. When an application initiates a second data operation, the HBase client queries the region information from the memory. If no match is found in the memory, the HBase client performs the preceding operations to obtain region information.

Kami menggunakan cookie untuk meningkatkan kualitas situs kami dan pengalaman Anda. Dengan melanjutkan penelusuran di situs kami berarti Anda menerima kebijakan cookie kami. Cari tahu selengkapnya

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback