What Should I Do If the Structured Streaming Task Submission Way Is Changed?

Question

When submitting a structured streaming task, you need to run the --jars command to specify the Kafka JAR package path, for example, --jars /kafkadir/kafka-clients-x.x.x.jar,/kafkadir/kafka_2.11-x.x.x.jar. However, in the current version, you need to configure additional items. Otherwise, an error is reported, indicating that the class is not found.

Answer

The Spark kernel of the current version depends on the Kafka JAR package, which is used by the structured streaming. Therefore, when submitting a structured streaming task, you need to add the Kafka JAR package path to the library directory of the driver of this task to ensure that the driver can properly load the Kafka package.

Solution

The following operations need to be performed additionally when a structured streaming task in Yarn-client mode is submitted:
Copy the path of spark.driver.extraClassPath in the spark-default.conf file in the Spark client directory, and add the Kafka JAR package path to its end. When submitting a structured stream task, add the --conf statement to combine these two configuration items. For example, if the Kafka JAR package path is /kafkadir, you need to add --conf spark.driver.extraClassPath=/opt/client/Spark2x/spark/conf/:/opt/client/Spark2x/spark/jars/*:/opt/client/Spark2x/spark/x86/*:/kafkadir/* when submitting the task.
The following operations need to be performed additionally when a structured streaming task in Yarn-cluster mode is submitted:
Copy the path of spark.yarn.cluster.driver.extraClassPath in the spark-default.conf file in the Spark client directory, and add relative paths of Kafka JAR packages to its end. When submitting a structured stream task, add the --conf statement to combine these two configuration items. For example, if the Kafka JAR package paths are kafka-clients-x.x.x.jar and kafka_2.11-x.x.x.jar, you need to add --conf spark.yarn.cluster.driver.extraClassPath=/home/huawei/Bigdata/common/runtime/security:./kafka-clients-x.x.x.jar:./kafka_2.11-x.x.x.jar when submitting the task.
In the current version, the structured streaming of Spark does not support versions earlier than Kafka2.x. In the upgrade scenario, use the client of earlier versions.

Parent topic: FAQs About Spark Application Development

Previous topic: Error Code 139 Reported When Python Pipeline Runs in the ARM Environment

Next topic: Common JAR File Conflicts

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.