Disk Space Is Used Up Due to Oversized Aggregated Logs of Yarn

Issue

The disk usage of the cluster is high.

Symptom

On the host management page of Manager, the disk usage is too high.
Only a few tasks are running on the Yarn web UI.
After the hdfs dfs -du -h / command is executed on the master node of the cluster, the command output shows that the following files consume a large amount of disk space.
The log aggregation configuration of the Yarn service is as follows.

Cause Analysis

Jobs are submitted too frequently, and the time for deleting aggregated log files is set to 1296000, that is, aggregated logs are retained for 15 days. As a result, aggregated logs cannot be released within a short period of time, exhausting the disk space.

Procedure

Log in to Manager and navigate to the all configurations page of the MapReduce service.
- MRS Manager: Log in to MRS Manager, choose Services > MapReduce > Service Configuration, and select All from the Type drop-down list.
- FusionInsight Manager: Log in to FusionInsight Manager and choose Cluster > Services > MapReduce. On the MapReduce page, choose Configurations > All Configurations.
Search for the yarn.log-aggregation.retain-seconds parameter and decrease its value based on site requirements, for example, to 259200. In this case, the aggregated logs of Yarn are retained for three days, and the disk space is automatically released after the retention period expires.
Click Save Configuration and deselect Restart the affected services or instances.
Restart the MapReduce service during off-peak hours. The restart will interrupt upper-layer services and affect cluster management, maintenance, and services.
1. Log in to Manager.
2. Restart the MapReduce service.