Updated on 2024-11-29 GMT+08:00

Spark Task Submission Failure

Symptom

  • A Spark task fails to be submitted.
  • Spark displays a message indicating that the Yarn JAR package cannot be obtained.
  • A file is submitted for multiple times.

Cause Analysis

  • Symptom 1:

    The most common cause for task submission failure is authentication failure.

    The parameter settings may be incorrect.

  • Symptom 2:

    By default, the cluster adds the Hadoop JAR package of the analysis node to the classpath of the task. If the system displays a message indicating that Yarn packages cannot be found, the Hadoop configuration is not set.

  • Symptom 3:

    The common scenario is as follows: The --files option is used to upload the user.keytab file, and then the --keytab option is used to specify the same file. As a result, the same file is uploaded for multiple times.

Procedure

  • Symptom 1:

    Run kinit [user] again and modify the corresponding configuration items.

  • Symptom 2:

    Check that the Hadoop configuration items are correct and the core-site.xml, hdfs-site.xml, yarn-site.xml, and mapred-site.xml configuration files in the conf directory of Spark are correct.

  • Symptom 3:

    Copy a new user.keytab file, for example:

    cp user.keytab user2.keytab

    spark-submit --master yarn --files user.keytab --keytab user2.keytab ......