What Should I Do If FileNotFoundException Occurs When spark-submit Is Used to Submit a Job in Spark on Yarn Client Mode?
Question
When user omm (not user root) uses spark-submit to submit a job in yarn-client mode, the FileNotFoundException occurs and the job can continue running. However, the logs of the Driver program fail to be viewed. For example, after running the spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client /opt/client/Spark/spark/examples/jars/spark-examples_2.11-2.2.1-mrs-1.7.0.jar command, the command output is shown in the following figure.
Answer
Possible Causes
When a job is executed in yarn-client mode, the Spark Driver is executed locally. The Driver log file is configured using -Dlog4j.configuration=./log4j-executor.properties. In the log4j-executor.properties configuration file, the Driver logs are outputted to the ${spark.yarn.app.container.log.dir}/stdout file. However, when Spark Driver is executed locally, its log output directory changes to /stdout, because ${spark.yarn.app.container.log.dir} is not configured. In addition, non-root users do not have the permission to create and modify stdout in the root directory. As a result, FileNotFoundException is reported. However, when a job is executed in yarn-cluster mode, the Spark Driver is executed on Application Master. When Application Master is started, a log output directory is set using -D${spark.yarn.app.container.log.dir}. Therefore, FileNotFoundException is not reported when the job is executed in yarn-cluster mode.
Solution:
Note: In the following example, the default value of $SPAKR_HOME is /opt/client/Spark/spark.
Solution 1: Manually switch the log configuration file. Change the value of the -Dlog4j.configuration=./log4j-executor.properties configuration item (default: ./log4j-executor.properties) of spark.driver.extraJavaOptions in the $SPARK_HOME/conf/spark-defaults.conf file. In yarn-client mode, change the value to -Dlog4j.configuration=./log4j.properties. In yarn-cluster mode, change the value to -Dlog4j.configuration=./log4j-executor.properties.
Solution 2: Modify the startup script $SPARK_HOME/bin/spark-class. In the spark-class script, add the following information below #!/usr/bin/env bash.
# Judge mode: client and cluster; Default: client argv=`echo $@ | tr [A-Z] [a-z]` if [[ "$argv" =~ "--master" ]];then mode=`echo $argv | sed -e 's/.*--master //'` master=`echo $mode | awk '{print $1}'` case $master in "yarn") deploy=`echo $mode | awk '{print $3}'` if [[ "$mode" =~ "--deploy-mode" ]];then deploy=$deploy else deploy="client" fi ;; "yarn-client"|"local") deploy="client" ;; "yarn-cluster") deploy="cluster" ;; esac else deploy="client" fi # modify the spark-defaults.conf number=`sed -n -e '/spark.driver.extraJavaOptions/=' $SPARK_HOME/conf/spark-defaults.conf` if [ "$deploy"x = "client"x ];then `sed -i "${number}s/-Dlog4j.configuration=.*properties /-Dlog4j.configuration=.\/log4j.properties /g" $SPARK_HOME/conf/spark-defaults.conf` else `sed -i "${number}s/-Dlog4j.configuration=.*properties /-Dlog4j.configuration=.\/log4j-executor.properties /g" $SPARK_HOME/conf/spark-defaults.conf` fi
The functions of these script lines are similar to those of solution 1. You can change the value of the -Dlog4j.configuration=./log4j-executor.properties configuration item (default: ./log4j-executor.properties) of spark.driver.extraJavaOptions in the $SPARK_HOME/conf/spark-defaults.conf file based on the Yarn mode.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot