What Should I Do If FileNotFoundException Occurs When spark-submit Is Used to Submit a Job in Spark on Yarn Client Mode?
Question
When user omm (not user root) uses spark-submit to submit a job in yarn-client mode, the FileNotFoundException occurs and the job can continue running. However, the logs of the Driver program fail to be viewed. For example, after running the spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client /opt/client/Spark/spark/examples/jars/spark-examples_2.11-2.2.1-mrs-1.7.0.jar command, the command output is shown in the following figure.
Answer
Possible Causes
When a job is executed in yarn-client mode, the Spark Driver is executed locally. The Driver log file is configured using -Dlog4j.configuration=./log4j-executor.properties. In the log4j-executor.properties configuration file, the Driver logs are outputted to the ${spark.yarn.app.container.log.dir}/stdout file. However, when Spark Driver is executed locally, its log output directory changes to /stdout, because ${spark.yarn.app.container.log.dir} is not configured. In addition, non-root users do not have the permission to create and modify stdout in the root directory. As a result, FileNotFoundException is reported. However, when a job is executed in yarn-cluster mode, the Spark Driver is executed on Application Master. When Application Master is started, a log output directory is set using -D${spark.yarn.app.container.log.dir}. Therefore, FileNotFoundException is not reported when the job is executed in yarn-cluster mode.
Solution:
Note: In the following example, the default value of $SPAKR_HOME is /opt/client/Spark/spark.
Solution 1: Manually switch the log configuration file. Change the value of the -Dlog4j.configuration=./log4j-executor.properties configuration item (default: ./log4j-executor.properties) of spark.driver.extraJavaOptions in the $SPARK_HOME/conf/spark-defaults.conf file. In yarn-client mode, change the value to -Dlog4j.configuration=./log4j.properties. In yarn-cluster mode, change the value to -Dlog4j.configuration=./log4j-executor.properties.
Solution 2: Modify the startup script $SPARK_HOME/bin/spark-class. In the spark-class script, add the following information below #!/usr/bin/env bash.
# Judge mode: client and cluster; Default: client argv=`echo $@ | tr [A-Z] [a-z]` if [[ "$argv" =~ "--master" ]];then mode=`echo $argv | sed -e 's/.*--master //'` master=`echo $mode | awk '{print $1}'` case $master in "yarn") deploy=`echo $mode | awk '{print $3}'` if [[ "$mode" =~ "--deploy-mode" ]];then deploy=$deploy else deploy="client" fi ;; "yarn-client"|"local") deploy="client" ;; "yarn-cluster") deploy="cluster" ;; esac else deploy="client" fi # modify the spark-defaults.conf number=`sed -n -e '/spark.driver.extraJavaOptions/=' $SPARK_HOME/conf/spark-defaults.conf` if [ "$deploy"x = "client"x ];then `sed -i "${number}s/-Dlog4j.configuration=.*properties /-Dlog4j.configuration=.\/log4j.properties /g" $SPARK_HOME/conf/spark-defaults.conf` else `sed -i "${number}s/-Dlog4j.configuration=.*properties /-Dlog4j.configuration=.\/log4j-executor.properties /g" $SPARK_HOME/conf/spark-defaults.conf` fi
The functions of these script lines are similar to those of solution 1. You can change the value of the -Dlog4j.configuration=./log4j-executor.properties configuration item (default: ./log4j-executor.properties) of spark.driver.extraJavaOptions in the $SPARK_HOME/conf/spark-defaults.conf file based on the Yarn mode.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.