Help Center/ MapReduce Service/ Developer Guide (LTS)/ Spark2x Development Guide (Security Mode)/ FAQs About Spark Application Development/ Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?
Updated on 2024-08-10 GMT+08:00

Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?

Question

When the KafkaWordCount task (org.apache.spark.examples.streaming.KafkaWordCount) is being submitted by running the spark-submit script, the log file shows that the Kafka-related class does not exist. The KafkaWordCount sample is provided by the Spark open-source community.

Answer

When Spark is deployed, the following JAR files are saved in the ${SPARK_HOME}/jars/streamingClient010 directory on the client and the ${BIGDATA_HOME}/FusionInsight_Spark2x_8.1.0.1/install/FusionInsight-Spark2x-3.1.1/spark/jars/streamingClient010 directory on the server.

  • kafka-clients-xxx.jar
  • kafka_2.12-xxx.jar
  • spark-streaming-kafka-0-10_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar
  • spark-token-provider-kafka-0-10_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar

Because $SPARK_HOME/jars/streamingClient010/* is not added in to classpath by default, you need to configure manually.

When the application is submitted and run, add following parameters in the command. For details, see Commissioning a Spark Application in a Linux Environment.

--jars $SPARK_CLIENT_HOME/jars/streamingClient010/kafka-client-2.4.0.jar,$SPARK_CLIENT_HOME/jars/streamingClient010/kafka_2.12-2.4.0.jar,$SPARK_CLIENT_HOME/jars/streamingClient010/spark-streaming-kafka-0-10_2.12-3.1.1-hw-ei-311001-SNAPSHOT.jar

You can run the preceding command to submit the self-developed applications and sample projects.

To submit the sample projects such as KafkaWordCount provided by Spark open source community, you need to add other parameters in addition to --jars. Otherwise, the ClassNotFoundException error will occur. The configurations in yarn-client and yarn-cluster modes are as follows:

  • yarn-client mode

    In the configuration file spark-defaults.conf on the client, add the path of the client dependency package, for example, $SPARK_HOME/jars/streamingClient010/*, (in addition to --jars) to the spark.driver.extraClassPath parameter.

  • yarn-cluster mode

    Perform any one of the following configurations in addition to --jars:

    • In the configuration file spark-defaults.conf on the client, add the path of the server dependency package, for example, ${BIGDATA_HOME}/FusionInsight_Spark2x_8.1.0.1/install/FusionInsight-Spark2x-3.1.1/spark/jars/streamingClient010/*, to the spark.yarn.cluster.driver.extraClassPath parameter.
    • Delete the original-spark-examples_2.12-3.1.1-xxx.jar packages from all the server nodes.
    • In the spark-defaults.conf configuration file on the client, modify (or add and modify) the spark.driver.userClassPathFirst parameter to true.