Scenario Description

Assume that Kafka receives the consumption records of five users every 30 seconds in a service. HBase table1 stores users' history consumption amount information.

There are 10 records in table 1, indicating that users whose user names are 1 to 10. All users' initial history consumption amount is 0 CNY.

Based on some service requirements, a Spark application must be developed to implement the following functions:

Calculate a user's consumption amount in real time using the following formula: Total consumption amount = Current consumption amount (Kafka data) + History consumption amount (value in table1). Then, update the calculation result to table1.

Data Planning

Create an HBase table and insert data.

Run the following command to create a table named table1 through HBase:
create 'table1', 'cf'

Run the following command on HBase to insert data into table1:

put 'table1', '1', 'cf:cid', '0'
put 'table1', '2', 'cf:cid', '0'
put 'table1', '3', 'cf:cid', '0'
put 'table1', '4', 'cf:cid', '0'
put 'table1', '5', 'cf:cid', '0'
put 'table1', '6', 'cf:cid', '0'
put 'table1', '7', 'cf:cid', '0'
put 'table1', '8', 'cf:cid', '0'
put 'table1', '9', 'cf:cid', '0'
put 'table1', '10', 'cf:cid', '0'

Data of the Spark Streaming sample project is stored in Kafka.
1. Ensure that the clusters are installed, including HDFS, Yarn, and Spark.
2. Modify allow.everyone.if.no.acl.found of Kafka Broker to true. (This parameter does not need to be set for the normal cluster.)
3. Create a topic.
  {zkQuorum} indicates ZooKeeper cluster information in the IP:port format.
  
  $KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper {zkQuorum}/kafka --replication-factor 1 --partitions 3 --topic {Topic}
4. Start the Producer of the sample code to send data to Kafka.
  {ClassPath} indicates the path for storing the JAR file of the project. The path is specified by users. For details about how to export the JAR file, see Compiling and Running a Spark Application.
  
  java -cp $SPARK_HOME/jars/*:$SPARK_HOME/jars/streamingClient/*:{JAR_PATH} com.huawei.bigdata.spark.examples.streaming.StreamingExampleProducer {BrokerList} {Topic}
- If Kerberos authentication is enabled, set spark.yarn.security.credentials.hbase.enabled in the client configuration file spark-default.conf and on the sparkJDBC server to true.
- The format of {zkQuorum} is in zkIp:2181 format.
- JAR_PATH indicates the path of the JAR package.
- The value of BrokerList is in brokerIp:9092 format.