What Should I Do If the Method of Submitting Structured Streaming Tasks Is Changed?
Question
When submitting a structured streaming task, you need to run the --jars command to specify the Kafka JAR package path, for example, --jars /kafkadir/kafka-clients-x.x.x.jar,/kafkadir/kafka_2.11-x.x.x.jar. However, in the current version, you need to configure additional items. Otherwise, an error is reported, indicating that the class is not found.
Answer
To ensure proper loading of the Kafka package, the driver of a structured streaming task must have the Kafka JAR package path added to its library directory. This is because the Spark kernel of the current version relies on the Kafka JAR package for structured streaming.
Solution
- The following operations need to be performed additionally when a structured streaming task in Yarn-client mode is submitted:
Copy the path of spark.driver.extraClassPath in the spark-default.conf file in the Spark client directory, and add the Kafka JAR package path to its end. When submitting a structured stream task, add the --conf statement to combine these two configuration items. For example, if the Kafka JAR package path is /kafkadir, you need to add --conf spark.driver.extraClassPath=/opt/client/Spark2x/spark/conf/:/opt/client/Spark2x/spark/jars/*:/opt/client/Spark2x/spark/x86/*:/kafkadir/* when submitting the task.
- The following operations need to be performed additionally when a structured streaming task in Yarn-cluster mode is submitted:
Copy the path of spark.yarn.cluster.driver.extraClassPath in the spark-default.conf file in the Spark client directory, and add relative paths of Kafka JAR packages to its end. When submitting a structured stream task, add the --conf statement to combine these two configuration items. For example, if the Kafka JAR package paths are kafka-clients-x.x.x.jar and kafka_2.11-x.x.x.jar, you need to add --conf spark.yarn.cluster.driver.extraClassPath=/home/huawei/Bigdata/common/runtime/security:./kafka-clients-x.x.x.jar:./kafka_2.11-x.x.x.jar when submitting the task.
- In the current version, the structured streaming of Spark does not support versions earlier than Kafka 2.x. In the upgrade scenario, use the client of earlier versions.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot