Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?

Question

When the KafkaWordCount task (org.apache.spark.examples.streaming.KafkaWordCount) is being submitted by running the spark-submit script, the log file shows that the Kafka-related class does not exist. The KafkaWordCount sample is provided by the Spark open-source community. The KafkaWordCount sample is provided by the Spark open-source community.

Answer

When Spark is deployed, the following JAR files are saved in the $SPARK_HOME/jars/streamingClient directory on the client and the /opt/Bigdata/MRS/FusionInsight-Spark-2.2.1/spark/jars/streamingClient directory on the server.

kafka-clients-0.8.2.1.jar
kafka_2.10-0.8.2.1.jar
spark-streaming-kafka_2.10-1.5.1.jar

Because $SPARK_HOME/lib/streamingClient/* is not added in to classpath by default, you need to configure manually.

When the application is submitted and run, add following parameters in the command:

--jars $SPARK_CLIENT_HOME/jars/streamingClient/kafka-clients-0.8.2.1.jar,$SPARK_CLIENT_HOME/jars/streamingClient/kafka_2.10-0.8.2.1.jar,$SPARK_CLIENT_HOME/jars/streamingClient/park-streaming-kafka_2.10-1.5.1.jar

You can run the preceding command to submit the self-developed applications and sample projects.

To submit the sample projects such as KafkaWordCount provided by Spark open source community, you need to add other parameters in addition to --jars. Otherwise, the ClassNotFoundException error will occur. The configurations in yarn-client and yarn-cluster modes are as follows:

yarn-client mode:
In the configuration file spark-defaults.conf on the client, add the path of the client dependency package, for example $SPARK_HOME/lib/streamingClient/*, (in addition to --jars) to the spark.driver.extraClassPath parameter.
yarn-cluster mode:
Perform any one of the following configurations in addition to --jars.
- In the configuration file spark-defaults.conf on the client, add the path of the server dependency package, for example /opt/huawei/Bigdata/FusionInsight/spark/spark/lib/streamingClient/*, to the spark.yarn.cluster.driver.extraClassPath parameter.
- Delete the spark-examples_2.10-1.5.1.jar package from each server node.
- In the spark-defaults.conf configuration file on the client, modify (or add and modify) the parameter spark.driver.userClassPathFirst to true.

Parent topic: FAQs About Spark Application Development

Previous topic: How Do I Handle the Dependency Package That Is Automatically Loaded?

Next topic: Why a Spark Core Application Is Suspended Instead of Being Exited When Driver Memory Is Insufficient to Store Collected Intensive Data?

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot