Updated on 2022-06-01 GMT+08:00

Scenario Description

Scenario Description

Assume that Kafka receives the consumption records of five users every 30 seconds in a service. HBase table1 stores users' history consumption amount information.

There are 10 records in table1, indicating that users whose user names are 1 to 10. All users' initial history consumption amount is 0 CNY.

Based on some service requirements, a Spark application must be developed to implement the following functions:

Calculate a user's consumption amount in real time using the following formula: Total consumption amount = Current consumption amount (Kafka data) + History consumption amount (value in table1). Then, update the calculation result to table1.

Data Planning

  1. Create an HBase table and insert data.

    1. Run the following command to create a table named table1 through HBase:

      create 'table1', 'cf'

    2. Run the following command on HBase to insert data into table1:
      put 'table1', '1', 'cf:cid', '0'
      put 'table1', '2', 'cf:cid', '0'
      put 'table1', '3', 'cf:cid', '0'
      put 'table1', '4', 'cf:cid', '0'
      put 'table1', '5', 'cf:cid', '0'
      put 'table1', '6', 'cf:cid', '0'
      put 'table1', '7', 'cf:cid', '0'
      put 'table1', '8', 'cf:cid', '0'
      put 'table1', '9', 'cf:cid', '0'
      put 'table1', '10', 'cf:cid', '0'

  2. Data of the Spark Streaming sample project is stored in Kafka.

    1. Ensure that the clusters are installed, including HDFS, Yarn, and Spark.
    2. Modify allow.everyone.if.no.acl.found of Kafka Broker to true. (This parameter does not need to be set for the normal cluster.)
    3. Create a topic.

      {zkQuorum} indicates ZooKeeper cluster information in the IP:port format.

      $KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper {zkQuorum}/kafka --replication-factor 1 --partitions 3 --topic {Topic}

    4. Start the Producer of the sample code to send data to Kafka.

      {ClassPath} indicates the path for storing the JAR file of the project. The path is specified by users. For details about how to export the JAR file, see Compiling and Running Applications.

      java -cp $SPARK_HOME/jars/*:$SPARK_HOME/jars/streamingClient/*:{JAR_PATH} com.huawei.bigdata.spark.examples.streaming.StreamingExampleProducer {BrokerList} {Topic}

    • If Kerberos authentication is enabled, set spark.yarn.security.credentials.hbase.enabled in the client configuration file spark-default.conf and on the sparkJDBC server to true.
    • The format of {zkQuorum} is in zkIp:2181 format.
    • JAR_PATH indicates the path of the JAR package.
    • The value of BrokerList is in brokerIp:9092 format.

Development Guidelines

  1. Receive data from Kafka and generate the corresponding DStream.
  2. Filter and analyze data.
  3. Find the corresponding record in the HBase table.
  4. Calculate the result and write the result to the HBase table.