Updated on 2022-09-14 GMT+08:00

Overview

Scenarios

Spark can access HBase in two clusters concurrently on condition that mutual trust has been configured between these two clusters.

Data Planning

  1. Configure the IP addresses and host names of all ZooKeeper and HBase nodes in cluster2 to the /etc/hosts file on the client node of cluster1.
  2. In cluster1 and cluster2, find the hbase-site.xml file in the conf directory of the Spark2x client, and save it to the /opt/example/A and /opt/example/B directories.
  3. Run the following spark-submit command:

    Before running the sample program, set the spark.yarn.security.credentials.hbase.enabled configuration item to true in the spark-defaults.conf configuration file of Spark client. (The default value is false. Changing the value to true does not affect existing services.) If you want to uninstall the HBase service, change the value back to false first.

    spark-submit --master yarn --deploy-mode client --files /opt/example/B/hbase-site.xml --keytab /opt/FIclient/user.keytab --principal sparkuser  --class com.huawei.spark.examples.SparkOnMultiHbase /opt/example/SparkOnMultiHbase-1.0.jar

Development Approach

  1. When accessing HBase, you need to use the configuration file of the corresponding cluster to create a Configuration object for creating a Connection object.
  2. Use the Connection object you create to perform operations on the HBase table, such as creating a table and inserting, viewing, and printing data.