Updated on 2024-10-08 GMT+08:00

Configuring the Recycle Bin Mechanism

Scenario

When a file is deleted in HDFS, it is moved to the recycle bin, or trash, instead of being immediately cleared. This allows for the recovery of deleted data in case of accidental deletions. After the aging time expires, a deleted file becomes an aging file and is cleared by the system or manually by users.

You can set the time threshold for storing files in the recycle bin. Once the file storage duration exceeds the threshold, it is permanently deleted from the recycle bin. If the recycle bin is cleared, all files in the recycle bin are permanently deleted.

Configuration Description

Parameter portal:

Go to the All Configurations page of HDFS and enter a parameter name in the search box by referring to Modifying Cluster Service Configuration Parameters.

Table 1 Parameter description

Parameter

Description

Default Value

fs.trash.interval

Trash collection time, in minutes. If data in the trash station exceeds the time, the data will be deleted. Value range: 1440 to 259200

1440

fs.trash.checkpoint.interval

Interval between trash checkpoints, in minutes. The value must be less than or equal to the value of fs.trash.interval. The checkpoint program creates a checkpoint every time it runs and removes the checkpoint created fs.trash.interval minutes ago. For example, the system checks whether aging files exist every 10 minutes and deletes aging files if any. Files that are not aging are stored in the checkpoint list waiting for the next check.

If this parameter is set to 0, the system does not check aging files and all aging files are saved in the system.

Value range: 0 to fs.trash.interval

NOTE:

It is not recommended to set this parameter to 0 because aging files will use up the disk space of the cluster.

60