Help Center/ MapReduce Service/ Developer Guide (LTS)/ Spark2x Development Guide (Common Mode)/ FAQs About Spark Application Development/ What Should I Do If a Large Number of Directories Whose Names Start with blockmgr- or spark- Exist in the /tmp Directory on the Client Installation Node?
Updated on 2024-08-10 GMT+08:00

What Should I Do If a Large Number of Directories Whose Names Start with blockmgr- or spark- Exist in the /tmp Directory on the Client Installation Node?

Question

After the system runs for a long time, there are many directories whose names start with blockmgr- or spark- in the /tmp directory on the node where the client is installed.

Figure 1 Residual directory example

Answer

During the running of Spark tasks, the driver creates a local temporary directory whose name starts with spark- for storing service JAR packages and configuration files. In addition, the driver creates a local temporary directory with the name starting with blockmgr- for storing block data. The two directories are automatically deleted when the Spark application running is finished.

The path for storing the two directories is preferentially specified by the environment variable SPARK_LOCAL_DIRS. If the environment variable is not configured, use the value of spark.local.dir as the path for storing the directories. If the environment variable and the preceding parameter both are not configured, use the value of java.io.tmpdir. By default, spark.local.dir is set to /tmp on the client. Therefore, the /tmp directory is used by default.

In some special cases, for example, the driver process does not exit normally, for example, the kill -9 command ends the process, or the Java virtual machine crashes. As a result, the directory cannot be deleted and remains in the system.

Currently, only the driver processes in yarn-client mode and local mode may confront the preceding problem. In yarn-cluster mode, the temporary directory of the process in the container is configured as the temporary directory of the container. When the container exits, the container automatically clears the directory. Therefore, this problem does not occur in yarn-cluster mode.

Solution

In Linux, you can configure automatic directory clearing for the /tmp temporary directory. Alternatively, you can change the value of spark.local.dir in the spark-defaults.conf configuration file on the client, specify the temporary directory to a specified directory, and configure a clear mechanism for the directory.