Configuring the Recycle Bin Mechanism

Scenario

The HDFS recycle bin mechanism is a data protection mechanism. It stores deleted files or directories in a specified location instead of deleting them permanently. If important data is deleted by mistake, you can restore the data from the recycle bin to prevent permanent data loss.

Deleted files are moved to the .Trash directory in the home directory, which is usually hdfs://<nameservice>/user/<username>/.Trash.

Advantages:
- Prevents data loss caused by mis-deletion and provides the data restoration capability.
- Meets data security and compliance requirements, and is especially suitable for production environments.
Disadvantages:
- Occupies extra storage space, especially when a large number of deletion operations are performed.
- May affect NameNode performance if the mechanism is configured improperly (for example, the retention period is too long).

You can adjust the retention period of HDFS files in the recycle bin. Once the retention period is exceeded, the files are permanently deleted from the recycle bin. If the recycle bin is cleared, all files in the recycle bin are permanently deleted.

Suggestions on recycle bin parameter settings:

Properly set the retention period of files based on the importance. For example, set the retention period to 7 days (10,080 minutes) for the production environment and a shorter period for the test environment.
Run the hdfs dfs -expunge -immediate command to periodically free up the storage space based on service requirements.
Exercise caution when using the -skipTrash parameter. Use this parameter to bypass the recycle bin only when you are sure that the data is unnecessary, preventing important data from being deleted by mistake.
For MRS 3.1.3 and later versions, -skipTrash parameter is disabled by default. To enable it, log in to FusionInsight Manager, choose Cluster > Services > HDFS > Configurations > All Configurations, search for the parameter dfs.client.skipTrash.enabled to true. This parameter specifies whether to use -skipTrash in hdfs dfs -rm HDFS client command to bypass the recycle bin to delete data. Then restart the service or instance, and update the client by referring to Updating the MRS Cluster Client After the Server Configuration Expires.
The recycle bin represents the final barrier to data protection. You are advised to use the recycle bin with the periodic backup mechanism.

Configuring HDFS Recycle Bin Parameters

Log in to FusionInsight Manager.

For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.
Choose Cluster > Services > HDFS > Configurations > All Configurations.

Search for and modify parameters in Table 1 as required.

**Table 1** Parameters
Parameter	Description	Example Value
fs.trash.interval	Retention period of files in the recycle bin. If the retention period is exceeded, the files will be permanently deleted. Set this parameter based on site requirements to mitigate data loss risks caused by misoperations. The unit is minute. The value ranges from 1440 to 259200.	2880
fs.trash.checkpoint.interval	Interval of checkpoints in the recycle bin. The value must be less than or equal to that of fs.trash.interval. The checkpoint program creates a checkpoint every time it runs and removes the checkpoint created fs.trash.interval minutes ago. For example, the system checks whether there are aging files every 10 minutes and deletes aging files if any. Files that are not aging are stored in the checkpoint list waiting for the next check. You are not advised to set this parameter to 0. Value 0 indicates that the system does not check aging files. All aging files are stored in the system, which may exhaust the disk space of the cluster. The unit is minute. The value ranges from 0 to the value of fs.trash.interval.	60

Click Save. Then, confirm the operation impact and click OK.
Then, click Finish.

Check whether there is any instance whose configuration has expired in the cluster. If yes, restart the instance for the configuration to take effect.