Updated on 2022-09-22 GMT+08:00

Configuring the Recycle Bin Mechanism

Scenario

On HDFS, deleted files are moved to the recycle bin (trash can) so that the data deleted by mistake can be restored.

You can set the time threshold for storing files in the recycle bin. Once the file storage duration exceeds the threshold, it is permanently deleted from the recycle bin. If the recycle bin is cleared, all files in the recycle bin are permanently deleted.

Configuration Description

If a file is deleted from HDFS, the file is saved in the trash space rather than cleared immediately. After the aging time is due, the deleted file becomes an aging file and will be cleared based on the system mechanism or manually cleared by users.

Parameter portal:

Go to the All Configurations page of HDFS and enter a parameter name in the search box by referring to Modifying Cluster Service Configuration Parameters.

Table 1 Parameter description

Parameter

Description

Default Value

fs.trash.interval

Trash collection time, in minutes. If data in the trash station exceeds the time, the data will be deleted. Value range: 1440 to 259200

1440

fs.trash.checkpoint.interval

Interval between trash checkpoints, in minutes. The value must be less than or equal to the value of fs.trash.interval. The checkpoint program creates a checkpoint every time it runs and removes the checkpoint created fs.trash.interval minutes ago. For example, the system checks whether aging files exist every 10 minutes and deletes aging files if any. Files that are not aging are stored in the checkpoint list waiting for the next check.

If this parameter is set to 0, the system does not check aging files and all aging files are saved in the system.

Value range: 0 to fs.trash.interval

NOTE:

It is not recommended to set this parameter to 0 because aging files will use up the disk space of the cluster.

60