Help Center/ MapReduce Service/ Developer Guide (Normal_Earlier Than 3.x)/ Spark Development Guide/ FAQs About Spark Application Development/ Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?
Updated on 2024-08-16 GMT+08:00

Why the "Class Does not Exist" Error Is Reported While the SparkStreamingKafka Project Is Running?

Question

When the KafkaWordCount task (org.apache.spark.examples.streaming.KafkaWordCount) is being submitted by running the spark-submit script, the log file shows that the Kafka-related class does not exist. The KafkaWordCount sample is provided by the Spark open-source community. The KafkaWordCount sample is provided by the Spark open-source community.

Answer

When Spark is deployed, the following JAR files are saved in the $SPARK_HOME/jars/streamingClient directory on the client and the /opt/Bigdata/MRS/FusionInsight-Spark-2.2.1/spark/jars/streamingClient directory on the server.

  • kafka-clients-0.8.2.1.jar
  • kafka_2.10-0.8.2.1.jar
  • spark-streaming-kafka_2.10-1.5.1.jar

Because $SPARK_HOME/lib/streamingClient/* is not added in to classpath by default, you need to configure manually.

When the application is submitted and run, add following parameters in the command:

--jars $SPARK_CLIENT_HOME/jars/streamingClient/kafka-clients-0.8.2.1.jar,$SPARK_CLIENT_HOME/jars/streamingClient/kafka_2.10-0.8.2.1.jar,$SPARK_CLIENT_HOME/jars/streamingClient/park-streaming-kafka_2.10-1.5.1.jar

You can run the preceding command to submit the self-developed applications and sample projects.

To submit the sample projects such as KafkaWordCount provided by Spark open source community, you need to add other parameters in addition to --jars. Otherwise, the ClassNotFoundException error will occur. The configurations in yarn-client and yarn-cluster modes are as follows:

  • yarn-client mode:

    In the configuration file spark-defaults.conf on the client, add the path of the client dependency package, for example $SPARK_HOME/lib/streamingClient/*, (in addition to --jars) to the spark.driver.extraClassPath parameter.

  • yarn-cluster mode:

    Perform any one of the following configurations in addition to --jars.

    • In the configuration file spark-defaults.conf on the client, add the path of the server dependency package, for example /opt/huawei/Bigdata/FusionInsight/spark/spark/lib/streamingClient/*, to the spark.yarn.cluster.driver.extraClassPath parameter.
    • Delete the spark-examples_2.10-1.5.1.jar package from each server node.
    • In the spark-defaults.conf configuration file on the client, modify (or add and modify) the parameter spark.driver.userClassPathFirst to true.