Why the "Class Does not Exist" Error Is Reported While the SparkStresmingKafka Project Is Running?

Question

While the KafkaWordCount task (org.apache.spark.examples.streaming.KafkaWordCount) is being submitted by running the spark-submit script, the log file shows that the Kafka-related class does not exist. The KafkaWordCount sample is provided by the Spark open-source community.

Answer

When Spark is deployed, the following JAR files are saved in the ${SPARK_HOME}/jars/streamingClient010 directory on the client and the ${BIGDATA_HOME}/FusionInsight_Spark2x_8.1.0.1/install/FusionInsight-Spark2x-3.1.1/spark/jars/streamingClient010 directory on the server.

kafka-clients-xxx.jar
kafka_2.12-xxx.jar
spark-streaming-kafka-0-10_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar
spark-token-provider-kafka-0-10_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar

Because $SPARK_HOME/jars/streamingClient010/* is not added in to classpath by default, you need to configure manually.

When the application is submitted and run, add following parameters in the command. See Writing and Running the Spark Program in the Linux Environment.

--jars $SPARK_CLIENT_HOME/jars/streamingClient010/kafka-client-2.4.0.jar,$SPARK_CLIENT_HOME/jars/streamingClient010/kafka_2.12-*.jar,$SPARK_CLIENT_HOME/jars/streamingClient010/spark-streaming-kafka-0-10_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar

You can run the preceding command to submit the self-developed applications and sample projects.

To submit the sample projects such as KafkaWordCount provided by Spark open source community, you need to add other parameters in addition to --jars. Otherwise, the ClassNotFoundException error will occur. The configurations in yarn-client and yarn-cluster modes are as follows:

yarn-client mode
In the configuration file spark-defaults.conf on the client, add the path of the client dependency package, for example $SPARK_HOME/jars/streamingClient010/*, (in addition to --jars) to the spark.driver.extraClassPath parameter.
yarn-cluster mode
Perform any one of the following configurations in addition to --jars:
- In the configuration file spark-defaults.conf on the client, add the path of the server dependency package, for example ${BIGDATA_HOME}/FusionInsight_Spark2x_8.1.0.1/install/FusionInsight-Spark2x-3.1.1/spark/jars/streamingClient010/*, to the spark.yarn.cluster.driver.extraClassPath parameter.
- Delete the original-spark-examples_2.12-3.1.1-xxx.jar packages from all the server nodes.
- In the spark-defaults.conf configuration file on the client, modify (or add and modify) the parameter spark.driver.userClassPathFirst to true.