Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?

Question

When the KafkaWordCount task (org.apache.spark.examples.streaming.KafkaWordCount) is being submitted by running the spark-submit script, the log file shows that the Kafka-related class does not exist. The KafkaWordCount sample is provided by the Spark open-source community. The KafkaWordCount sample is provided by the Spark open-source community.

Answer

When Spark is deployed, the following JAR files are saved in the $SPARK_HOME/jars/streamingClient directory on the client and the /opt/Bigdata/MRS/FusionInsight-Spark-2.2.1/spark/jars/streamingClient directory on the server.

kafka-clients-0.8.2.1.jar
kafka_2.10-0.8.2.1.jar
spark-streaming-kafka_2.10-1.5.1.jar

Because $SPARK_HOME/lib/streamingClient/* is not added in to classpath by default, you need to configure manually.

When the application is submitted and run, add following parameters in the command:

--jars $SPARK_CLIENT_HOME/jars/streamingClient/kafka-clients-0.8.2.1.jar,$SPARK_CLIENT_HOME/jars/streamingClient/kafka_2.10-0.8.2.1.jar,$SPARK_CLIENT_HOME/jars/streamingClient/park-streaming-kafka_2.10-1.5.1.jar

You can run the preceding command to submit the self-developed applications and sample projects.

To submit the sample projects such as KafkaWordCount provided by Spark open source community, you need to add other parameters in addition to --jars. Otherwise, the ClassNotFoundException error will occur. The configurations in yarn-client and yarn-cluster modes are as follows:

yarn-client mode:
In the configuration file spark-defaults.conf on the client, add the path of the client dependency package, for example $SPARK_HOME/lib/streamingClient/*, (in addition to --jars) to the spark.driver.extraClassPath parameter.
yarn-cluster mode:
Perform any one of the following configurations in addition to --jars.
- In the configuration file spark-defaults.conf on the client, add the path of the server dependency package, for example /opt/huawei/Bigdata/FusionInsight/spark/spark/lib/streamingClient/*, to the spark.yarn.cluster.driver.extraClassPath parameter.
- Delete the spark-examples_2.10-1.5.1.jar package from each server node.
- In the spark-defaults.conf configuration file on the client, modify (or add and modify) the parameter spark.driver.userClassPathFirst to true.