Updated on 2022-06-01 GMT+08:00

Scenario Description

Scenario Description

Assume that table1 of HBase stores a user's consumption amount on the current day and table2 stores the user's history consumption amount data.

In table1, the key=1,cf:cid=100 record indicates that user1's consumption amount on the current day is 100 CNY.

In table2, the key=1,cf:cid=1000 record indicates that user1's history consumption amount is 1000 CNY.

Based on some service requirements, a Spark application must be developed to implement the following functions:

Calculate a user's history consumption amount based on the user name, that is, the user's total consumption amount =100 (consumption amount of the current day) + 1000 (history consumption amount).

In the preceding example, the application run result is that in table2, the total consumption amount of user1 (key=1) is cf:cid=1100 CNY.

Data Planning

Use the HBase shell tool to create HBase table1 and table2 and insert data to them.

  1. Run the following command to create a table named table1 through HBase:

    create 'table1', 'cf'

  2. Run the following command to insert data through HBase:

    put 'table1', '1', 'cf:cid', '100'

  3. Run the following command to create a table named table2 through HBase:

    create 'table2', 'cf'

  4. Run the following command on HBase to insert data into table2:

    put 'table2', '1', 'cf:cid', '1000'

    If Kerberos authentication is enabled, set spark.yarn.security.credentials.hbase.enabled in the client configuration file spark-defaults.conf and on the sparkJDBC server to true.

Development Guidelines

  1. Query data in table1.
  2. Query data in table2 based on the key value in table1.
  3. Sum the data records obtained in the previous two steps.
  4. Write the result of the previous step to table2.