Updated on 2022-09-14 GMT+08:00

Scenario Description

Scenario Description

Users can use Spark to call HBase APIs to operate HBase tables. In the Spark applications, users can use HBase APIs to create a table, read the table, and insert data into the table.

Data Planning

Save the original data files in HDFS.

  1. Create the input_data1.txt text file on the local PC and copy the following content to the input_data1.txt file.
    20,30,40,xxx
  2. Create the /tmp/input folder in the HDFS, and run the following commands to upload input_data1.txt to the /tmp/input directory:
    1. On the HDFS client, run the following commands for authentication:

      cd /opt/client

      kinit -kt '/opt/client/Spark/spark/conf/user.keytab' <Service user for authentication>

      Specify the path of the user.keytab file based on the site requirements.

    2. On the HDFS client running the Linux OS, run the hadoop fs -mkdir /tmp/input command (or the hdfs dfs command) to create a directory.
    3. On the HDFS client running the Linux OS, run the hadoop fs -put input_xxx.txt /tmp/input command to upload the data file.

If Kerberos authentication is enabled, set spark.yarn.security.credentials.hbase.enabled in the client configuration file spark-defaults.conf to true.

Development Guidelines

  1. Create an HBase table.
  2. Insert data to the HBase table.
  3. Use Spark Application to read data from the HBase table.