Inconsistency Between df and du Command Output on the Core Node

Symptom

After the df and du commands are executed, the values of the core node capacity displayed are different.

The disk usage of the /srv/BigData/hadoop/data1/ directory queried by running the df -h command differs greatly from that queried by running the du -sh /srv/BigData/hadoop/data1/ command. The difference is greater than 10 GB.

Cause Analysis

The lsof |grep deleted command output indicates that a large number of log files in the directory are in the deleted state.

When some Spark tasks are running for a long time, some containers in the tasks keep running and logs are continuously generated. When printing logs, the executor of Spark uses the log4j log scrolling function to output logs to the stdout file. The container also monitors this file. As a result, the file is monitored by two processes at the same time. When one process scrolls according to the configuration, the earliest log file is deleted, but the other process still occupies the file handle. As a result, a file in the deleted state is generated.

Procedure

Change the output directory name for executor logs of Spark.

Open the log configuration file. By default, the configuration file is located in <Client installation directory>/Spark/spark/conf/log4j-executor.properties.

Change the name of the log output file.

For example:

log4j.appender.sparklog.File = ${spark.yarn.app.container.log.dir}/stdout

is changed to

log4j.appender.sparklog.File = ${spark.yarn.app.container.log.dir}/stdout.log

Save the configuration and exit.
Submit the task again.

Parent topic: Cluster Management

Previous topic: Failed to Execute an MRS Backup Task

Next topic: Disassociating a Subnet from a Network ACL

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.