Help Center/ MapReduce Service/ Developer Guide (Normal_Earlier Than 3.x)/ Spark Development Guide/ FAQs About Spark Application Development/ What Should I Do If FileNotFoundException Occurs When spark-submit Is Used to Submit a Job in Spark on Yarn Client Mode?
Updated on 2024-08-16 GMT+08:00

What Should I Do If FileNotFoundException Occurs When spark-submit Is Used to Submit a Job in Spark on Yarn Client Mode?

Question

When user omm (not user root) uses spark-submit to submit a job in yarn-client mode, the FileNotFoundException occurs and the job can continue running. However, the logs of the Driver program fail to be viewed. For example, after running the spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client /opt/client/Spark/spark/examples/jars/spark-examples_2.11-2.2.1-mrs-1.7.0.jar command, the command output is shown in the following figure.

Answer

Possible Causes

When a job is executed in yarn-client mode, the Spark Driver is executed locally. The Driver log file is configured using -Dlog4j.configuration=./log4j-executor.properties. In the log4j-executor.properties configuration file, the Driver logs are outputted to the ${spark.yarn.app.container.log.dir}/stdout file. However, when Spark Driver is executed locally, its log output directory changes to /stdout, because ${spark.yarn.app.container.log.dir} is not configured. In addition, non-root users do not have the permission to create and modify stdout in the root directory. As a result, FileNotFoundException is reported. However, when a job is executed in yarn-cluster mode, the Spark Driver is executed on Application Master. When Application Master is started, a log output directory is set using -D${spark.yarn.app.container.log.dir}. Therefore, FileNotFoundException is not reported when the job is executed in yarn-cluster mode.

Solution:

Note: In the following example, the default value of $SPAKR_HOME is /opt/client/Spark/spark.

Solution 1: Manually switch the log configuration file. Change the value of the -Dlog4j.configuration=./log4j-executor.properties configuration item (default: ./log4j-executor.properties) of spark.driver.extraJavaOptions in the $SPARK_HOME/conf/spark-defaults.conf file. In yarn-client mode, change the value to -Dlog4j.configuration=./log4j.properties. In yarn-cluster mode, change the value to -Dlog4j.configuration=./log4j-executor.properties.

Solution 2: Modify the startup script $SPARK_HOME/bin/spark-class. In the spark-class script, add the following information below #!/usr/bin/env bash.

# Judge mode: client and cluster; Default: client
argv=`echo $@ | tr [A-Z] [a-z]`
if [[ "$argv" =~ "--master" ]];then
    mode=`echo $argv | sed -e 's/.*--master //'`
    master=`echo $mode | awk '{print $1}'`
    case $master in
    "yarn")
        deploy=`echo $mode | awk '{print $3}'`
        if [[ "$mode" =~ "--deploy-mode" ]];then
                deploy=$deploy
        else
                deploy="client"
        fi
    ;;
  "yarn-client"|"local")
        deploy="client"
    ;;
    "yarn-cluster")
        deploy="cluster"
    ;;
    esac
else
    deploy="client"
fi
# modify the spark-defaults.conf
number=`sed -n -e '/spark.driver.extraJavaOptions/=' $SPARK_HOME/conf/spark-defaults.conf`
if [ "$deploy"x = "client"x ];then
    `sed -i "${number}s/-Dlog4j.configuration=.*properties /-Dlog4j.configuration=.\/log4j.properties /g" $SPARK_HOME/conf/spark-defaults.conf`
else
    `sed -i "${number}s/-Dlog4j.configuration=.*properties /-Dlog4j.configuration=.\/log4j-executor.properties /g" $SPARK_HOME/conf/spark-defaults.conf`
fi

The functions of these script lines are similar to those of solution 1. You can change the value of the -Dlog4j.configuration=./log4j-executor.properties configuration item (default: ./log4j-executor.properties) of spark.driver.extraJavaOptions in the $SPARK_HOME/conf/spark-defaults.conf file based on the Yarn mode.