Help Center/ MapReduce Service/ Troubleshooting/ Using Yarn/ Temporary Files Are Not Deleted When a MapReduce Job Is Abnormal

Updated on 2023-11-30 GMT+08:00

View PDF

Temporary Files Are Not Deleted When a MapReduce Job Is Abnormal

Issue

Temporary files are not deleted when a MapReduce job is abnormal.

MR jobs are MapReduce jobs. For details about MapReduce, see MapReduce.

Symptom

There are too many files in the HDFS temporary directory, occupying too much memory.

Cause Analysis

When a MapReduce job is submitted, related configuration files, JAR files, and files added by the -files parameter are uploaded to the temporary directory on HDFS so that the started container can obtain the files. The configuration item yarn.app.mapreduce.am.staging-dir determines the storage path. The default value is /tmp/hadoop-yarn/staging.

After a properly running MapReduce job is complete, temporary files are deleted. However, when a Yarn task corresponding to the job exits abnormally, temporary files are not deleted. As a result, the number of files in the temporary directory increases over time, occupying more and more storage space.

Procedure

Log in to the cluster client.
1. Log in to any master node as user root. The user password is the one defined during cluster creation.
2. If Kerberos authentication is enabled for the cluster, run the following commands to go to the client installation directory and configure environment variables. Then, authenticate the user and enter the password as prompted. Obtain the password from an administrator.
  cd Client installation directory
  
  source bigdata_env
  
  kinit hdfs
3. If Kerberos authentication is not enabled for the cluster, run the following commands to switch to user omm and go to the client installation directory to configure environment variables:
  su - omm
  
  cd Client installation directory
  
  source bigdata_env
Obtain the file list.

hdfs dfs -ls /tmp/hadoop-yarn/staging/*/.staging/ | grep "^drwx" | awk '{print $8}' > job_file_list

The job_file_list file contains the folder list of all jobs. The following shows an example of the file content:
```
/tmp/hadoop-yarn/staging/omm/.staging/job__<Timestamp>_<ID>
```
Collect statistics on running jobs.

mapred job -list 2>/dev/null | grep job_ | awk '{print $1}' > run_job_list

The run_job_list file contains the IDs of running jobs. The content format is as follows:
```
job_<Timestamp>_<ID>
```
Delete running tasks from the job_file_list file to ensure that data of the running tasks is not deleted by mistake during deletion of expired data.

cat run_job_list | while read line; do sed -i "/$line/d" job_file_list; done
Delete expired data.

cat job_file_list | while read line; do hdfs dfs -rm -r $line; done
Delete temporary files.

rm -rf run_job_list job_file_list