Scenario Description

Updated on 2022-09-14 GMT+08:00

Scenario Description

Users can use Spark to call HBase APIs to operate HBase tables. In the Spark applications, users can use HBase APIs to create a table, read the table, and insert data into the table.

Data Planning

Save the original data files in HDFS.

  1. Create the input_data1.txt text file on the local PC and copy the following content to the input_data1.txt file.
    20,30,40,xxx
  2. Create the /tmp/input folder in the HDFS, and run the following commands to upload input_data1.txt to the /tmp/input directory:
    1. On the HDFS client, run the following commands for authentication:

      cd /opt/client

      kinit -kt '/opt/client/Spark/spark/conf/user.keytab' <Service user for authentication>

      NOTE:

      Specify the path of the user.keytab file based on the site requirements.

    2. On the HDFS client running the Linux OS, run the hadoop fs -mkdir /tmp/input command (or the hdfs dfs command) to create a directory.
    3. On the HDFS client running the Linux OS, run the hadoop fs -put input_xxx.txt /tmp/input command to upload the data file.

NOTE:

If Kerberos authentication is enabled, set spark.yarn.security.credentials.hbase.enabled in the client configuration file spark-defaults.conf to true.

Development Guidelines

  1. Create an HBase table.
  2. Insert data to the HBase table.
  3. Use Spark Application to read data from the HBase table.
Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback