Scenario Description

In Spark applications, use Structured Streaming to call Kafka APIs to obtain word records. Classify word records to obtain the number of records of each word.

Data Planning

Data of the Structured Streaming sample project is stored in Kafka. Send data to Kafka (A user with the Kafka permission is required).

Ensure that the clusters are installed, including HDFS, Yarn, Spark, and Kafka.
Modify allow.everyone.if.no.acl.found of Kafka Broker to true. (This parameter does not need to be set for the normal cluster.)
Create a topic.
{zkQuorum} indicates ZooKeeper cluster information in the IP:port format.

$KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper {zkQuorum}/kafka --replication-factor 1 --partitions 1 --topic {Topic}
Start the Producer of Kafka to send data to Kafka.

{ClassPath} indicates the path for storing the JAR file of the project. The path is specified by users. For details, see Compiling and Running a Spark Application.

java -cp $SPARK_HOME/jars/*:$SPARK_HOME/jars/streamingClient010/*:{JAR_PATH} com.huawei.bigdata.spark.examples.KafkaWordCountProducer {BrokerList} {Topic} {messagesPerSec} {wordsPerMessage}

JAR_PATH indicates the path of the JAR package. The value of BrokerList is in brokerIp:9092 format.
If the user needs to connect to the security Kafka, add KafkaClient configuration information to the jaas.conf file in the conf directory on the Spark client. The following is an example:
```
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab = "./user.keytab"
principal="leoB@HADOOP.COM"
useTicketCache=false
storeKey=true
debug=true;
};
```
In Spark on Yarn mode, jaas.conf and user.keytab are distributed to the container directory of Spark on Yarn through Yarn. Therefore, the path of keyTab in KafkaClient must be the same as the path of jaas.conf, for example, ./user.keytab. Change principal to the username created by yourself and domain name of the cluster.

Development Guidelines

Receive data from Kafka and generate the corresponding DataStreamReader.
Classify word records.
Calculate the result and print it.

Parent topic: Structured Streaming Application

Previous topic: Structured Streaming Application

Next topic: Java Sample Code

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot