Updated on 2025-04-14 GMT+08:00

How Do I Configure the HBase Dual Read Function?

Scenario

The HBase client application loads the configuration items of the active and standby clusters by customization to implement the dual-read capability. HBase dual-read is a key feature that improves the high availability of the HBase cluster system. It applies to read data using Get, read data in batches using Get, read data using Scan, and query data using a secondary index. HBase can read data from the active and standby clusters at the same time, reducing the query glitch time. The advantages are as follows:
  • High success rate: The concurrent dual-read mechanism ensures a high success rate of read requests.
  • High availability: When a single cluster is faulty, the query service is not interrupted. A short network jitter does not prolong the query time.
  • High generality: The dual-read feature does not support dual-write, but does not affect the original real-time write scenario.
  • Ease-of-use: Client encapsulation is performed, which is not sensed by services.

Restrictions on HBase dual-read:

  • The HBase dual-read feature is implemented based on replication. Data read from the standby cluster may be different from that from the active cluster. Therefore, only eventual consistency can be achieved.
  • Currently, the HBase dual-read feature is used only for query. When the active cluster breaks down, the latest data cannot be synchronized. As a result, the latest data cannot be queried in the standby cluster.
  • A Scan operation of HBase may be split into multiple RPC operations. Data may not be completely the same because related session information is not synchronized between different clusters. Therefore, the dual-read feature takes effect only when an RPC operation is performed for the first time. Requests before ResultScanner close access the cluster used for the first RPC operation.
  • The HBase Admin API and real-time write API access only the active cluster. Therefore, after the active cluster breaks down, the Admin API and real-time write API are unavailable, and only the Get and Scan query services are available.

Add the Active/Standby Cluster Configuration to the hbase-dual.xml File

  1. Save the keytab authentication files user.keytab and krb5.conf of the active cluster obtained when Preparing MRS Application Development User to the src/main/resources/conf secondary sample directory.
  2. Obtain the client configuration files core-site.xml, hbase-site.xml, and hdfs-site.xml of the HBase active cluster and save them to the src/main/resources/conf/active directory. This directory needs to be created by yourself. For details, see Preparing for HBase Development and Operating Environment.
  3. Obtain the client configuration files core-site.xml, hbase-site.xml, and hdfs-site.xml of the standby cluster and save them to the src/main/resources/conf/standby directory. For details, see Preparing for HBase Development and Operating Environment.
  4. Create the hbase-dual.xml configuration file and save it to the src/main/resources/conf/ directory. This directory needs to be created by yourself. For details about the configuration items in the configuration file, see HBase Dual-Read Operations.

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
    <!--Configuration file directory of the active cluster-->
        <property>
            <name>hbase.dualclient.active.cluster.configuration.path</name>
            <value>{Sample code directory}\\src\\main\\resources\\active</value>
            </property>
    <!--Configuration file directory of the standby cluster-->
        <property>
            <name>hbase.dualclient.standby.cluster.configuration.path</name>
            <value>{Sample code directory}\\src\\main\\resources\\standby</value>
        </property>
    <!--Connection implementation of the dual-read mode-->
        <property>
            <name>hbase.client.connection.impl</name>
            <value>org.apache.hadoop.hbase.client.HBaseMultiClusterConnectionImpl</value>
        </property>
    <!--Security mode-->
        <property>
            <name>hbase.security.authentication</name>
            <value>kerberos</value>
        </property>
    <!--Security mode-->
        <property>
            <name>hadoop.security.authentication</name>
            <value>kerberos</value>
        </property>

  5. Creating a dual-read configuration.

    • The following code snippet belongs to the init method in TestMain class of the com.huawei.bigdata.hbase.examples packet.
      private static void init() throws IOException {
          // Default load from conf directory
          conf = HBaseConfiguration.create();
          //In Windows environment
          String userdir = TestMain.class.getClassLoader().getResource("conf").getPath() + File.separator;
          //In Linux environment
          //String userdir = System.getProperty("user.dir") + File.separator + "conf" + File.separator;
          conf.addResource(new Path(userdir + "hbase-dual.xml"), false);
        }

  6. Determining the data source cluster

    • GET request. The following code snippet belongs to the testGet method in HBaseSample class of the com.huawei.bigdata.hbase.examples packet.
      Result result = table.get(get); 
      if (result instanceof DualResult) {
           LOG.info(((DualResult)result).getClusterId()); 
      }
    • Scan request. The following code snippet belongs to the testScanData method in HBaseSample class of the com.huawei.bigdata.hbase.examples packet.
      ResultScanner rScanner = table.getScanner(scan);  
      if (rScanner instanceof HBaseMultiScanner) {
           LOG.info(((HBaseMultiScanner)rScanner).getClusterId()); 
      }

  7. The client can print metric information.

    Add the following content to the log4j.properties file so that the client can export metric information to the specified file: For details about the metrics, see Printing Metric Information.

    log4j.logger.DUAL=debug,DUAL 
    log4j.appender.DUAL=org.apache.log4j.RollingFileAppender 
    log4j.appender.DUAL.File=/var/log/dual.log //Local dual-read log path on the client. Change the value to the actual directory, but ensure that the directory has the write permission.
    log4j.additivity.DUAL=false 
    log4j.appender.DUAL.MaxFileSize=${hbase.log.maxfilesize} 
    log4j.appender.DUAL.MaxBackupIndex=${hbase.log.maxbackupindex} 
    log4j.appender.DUAL.layout=org.apache.log4j.PatternLayout 
    log4j.appender.DUAL.layout.ConversionPattern=%d{ISO8601} %-5p [%t] %c{2}: %m%n

HBase Dual-Read Operations

Table 1 Configuration items in hbase-dual.xml

Configuration Item

Description

Default Value

Level

hbase.dualclient.active.cluster.configuration.path

HBase client configuration directory of the active cluster

None

Mandatory

hbase.dualclient.standby.cluster.configuration.path

HBase client configuration directory of the standby cluster

None

Mandatory

dual.client.schedule.update.table.delay.second

DR table update interval

5

Optional

hbase.dualclient.glitchtimeout.ms

Maximum glitch time can be tolerated in the active cluster

50

Optional

hbase.dualclient.slow.query.timeout.ms

Slow query alarm log

180000

Optional

hbase.dualclient.active.cluster.id

Active cluster ID

ACTIVE

Optional

hbase.dualclient.standby.cluster.id

Standby cluster ID

STANDBY

Optional

hbase.dualclient.active.executor.thread.max

Maximum size of the thread pool for processing requests to the active cluster

100

Optional

hbase.dualclient.active.executor.thread.core

Core size of the thread pool for processing requests to the active cluster

100

Optional

hbase.dualclient.active.executor.queue

Queue size of the thread pool for processing requests to the active cluster

256

Optional

hbase.dualclient.standby.executor.thread.max

Maximum size of the thread pool for processing requests to the standby cluster

100

Optional

hbase.dualclient.standby.executor.thread.core

Core size of the thread pool for processing requests to the standby cluster

100

Optional

hbase.dualclient.standby.executor.queue

Queue size of the thread pool for processing requests to the standby cluster

256

Optional

hbase.dualclient.clear.executor.thread.max

Maximum size of the thread pool for clearing resources

30

Optional

hbase.dualclient.clear.executor.thread.core

Core size of the thread pool for clearing resources

30

Optional

hbase.dualclient.clear.executor.queue

Queue size of the thread pool for clearing resources

Integer. MAX_VALUE

Optional

dual.client.metrics.enable

Whether to print client metric information

true

Optional

dual.client.schedule.metrics.second

Interval for printing client metric information

300

Optional

dual.client.asynchronous.enable

Whether to asynchronously request the active and standby clusters

false

Optional

Printing Metric Information

Table 2 Basic specifications

Metric Name

Description

Log level

total_request_count

Total number of queries in a period

INFO

active_success_count

Number of successful queries in the active cluster in a period

INFO

active_error_count

Number of failed queries in the active cluster in a period

INFO

active_timeout_count

Number of query timeouts in the active cluster in a period

INFO

standby_success_count

Number of successful queries in the standby cluster in a period

INFO

standby_error_count

Number of failed queries in the standby cluster in a period

INFO

Active Thread pool

Periodically printed information about the thread pool for processing requests to the active cluster

DEBUG

Standby Thread pool

Periodically printed information about the thread pool for processing requests to the standby cluster

DEBUG

Clear Thread pool

Periodically printed information about the thread pool for releasing resources

DEBUG

Table 3 Histogram indicators for GET, BatchGET, and SCAN requests

Metric Name

Description

Log level

averageLatency(ms)

Average latency

INFO

minLatency(ms)

Minimum latency

INFO

maxLatency(ms)

Maximum latency

INFO

95thPercentileLatency(ms)

Maximum latency of 95% requests

INFO

99thPercentileLatency(ms)

Maximum latency of 99% requests

INFO

99.9PercentileLatency(ms)

Maximum latency of 99.9% requests

INFO

99.99PercentileLatency(ms)

Maximum latency of 99.99% requests

INFO