Disk Space Is Insufficient Due to Long-Term Running of JDBCServer
Issue
When the JDBCServer service connected to Spark submits a spark-sql task to the Yarn cluster, the data disk of the Core node is fully occupied after the task runs for a period of time.
Symptom
When the JDBCServer service of a customer connected to Spark submits a spark-sql task to the Yarn cluster, the data disk of the Core node is fully occupied after the task runs for a period of time.
After checking the disk usage in the background, it is found that there are too many APP temporary files (files generated by shuffle) of the JDBCServer service, and the files are not cleared, occupying a large amount of memory.
Cause Analysis
After checking the directories that contain a large number of files on the Core node, it is found that most of the directories are similar to blockmgr-033707b6-fbbb-45b4-8e3a-128c9bcfa4bf, which stores temporary shuffle files generated during computing.
The dynamic resource allocation function of Spark is enabled on JDBCServer, and shuffle is hosted by NodeManager. NodeManager only manages these files based on the running period of the application, and does not check whether the container where a single executor is located exists. Therefore, the temporary files are deleted only when the app is stopped. When a task runs for a long time, a large number of temporary files occupy a large amount of disk space.
Procedure
Start a scheduled task to delete shuffle files that have been stored for a specified period of time. For example, delete shuffle files that have been stored for more than 6 hours each hour.
- Create the clean_appcache.sh script. If there are multiple data disks, change the value of data1 in BASE_LOC based on the actual situation.
- Security cluster
#!/bin/bash BASE_LOC=/srv/BigData/hadoop/data1/nm/localdir/usercache/spark/appcache/application_*/blockmgr* find $BASE_LOC/ -mmin +360 -exec rmdir {} \; find $BASE_LOC/ -mmin +360 -exec rm {} \;
- Common cluster
#!/bin/bash BASE_LOC=/srv/BigData/hadoop/data1/nm/localdir/usercache/omm/appcache/application_*/blockmgr* find $BASE_LOC/ -mmin +360 -exec rmdir {} \; find $BASE_LOC/ -mmin +360 -exec rm {} \;
- Security cluster
- Run the following commands to change the permission to the script:
chmod 755 clean_appcache.sh
- Add a scheduled task to start the clearance script. Change the script path to the actual path.
Run the crontab -l command to view the scheduled task.
Run the crontab -e command to edit the scheduled task.
0 * * * * sh /root/clean_appcache.sh > /dev/null 2>&1
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot