Why the "Class Does not Exist" Error Is Reported While the SparkStresmingKafka Project Is Running?
Question
While the KafkaWordCount task (org.apache.spark.examples.streaming.KafkaWordCount) is being submitted by running the spark-submit script, the log file shows that the Kafka-related class does not exist. The KafkaWordCount sample is provided by the Spark open-source community.
Answer
When Spark is deployed, the following JAR files are saved in the ${SPARK_HOME}/jars/streamingClient010 directory on the client and the ${BIGDATA_HOME}/FusionInsight_Spark2x_8.1.0.1/install/FusionInsight-Spark2x-3.1.1/spark/jars/streamingClient010 directory on the server.
- kafka-clients-xxx.jar
- kafka_2.12-xxx.jar
- spark-streaming-kafka-0-10_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar
- spark-token-provider-kafka-0-10_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar
Because $SPARK_HOME/jars/streamingClient010/* is not added in to classpath by default, you need to configure manually.
When the application is submitted and run, add following parameters in the command. See Writing and Running the Spark Program in the Linux Environment.
--jars $SPARK_CLIENT_HOME/jars/streamingClient010/kafka-client-2.4.0.jar,$SPARK_CLIENT_HOME/jars/streamingClient010/kafka_2.12-*.jar,$SPARK_CLIENT_HOME/jars/streamingClient010/spark-streaming-kafka-0-10_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar
You can run the preceding command to submit the self-developed applications and sample projects.
To submit the sample projects such as KafkaWordCount provided by Spark open source community, you need to add other parameters in addition to --jars. Otherwise, the ClassNotFoundException error will occur. The configurations in yarn-client and yarn-cluster modes are as follows:
- yarn-client mode
In the configuration file spark-defaults.conf on the client, add the path of the client dependency package, for example $SPARK_HOME/jars/streamingClient010/*, (in addition to --jars) to the spark.driver.extraClassPath parameter.
- yarn-cluster mode
Perform any one of the following configurations in addition to --jars:
- In the configuration file spark-defaults.conf on the client, add the path of the server dependency package, for example ${BIGDATA_HOME}/FusionInsight_Spark2x_8.1.0.1/install/FusionInsight-Spark2x-3.1.1/spark/jars/streamingClient010/*, to the spark.yarn.cluster.driver.extraClassPath parameter.
- Delete the original-spark-examples_2.12-3.1.1-xxx.jar packages from all the server nodes.
- In the spark-defaults.conf configuration file on the client, modify (or add and modify) the parameter spark.driver.userClassPathFirst to true.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot