Updated on 2025-08-09 GMT+08:00

Configuring the Policy for Clearing Recycle Bin Directories of MRS Cluster Components

Scenarios

In MRS 3.2.0-LTS.1 or later, components prevent mis-deletion by default. That is, file data deleted by component users is not directly deleted but stored in the recycle bin directory in the OBS file system. This function is compatible with the native garbage collection mechanism of Hadoop FS, providing additional data protection for OBS-based Hadoop big data systems.

This section describes how to set the lifecycle policy of the recycle bin directory in the OBS file system to automatically clear related data periodically. The recycle bin directory is created for each required user. If a new user is added to the MRS cluster and has the permission to delete component data, you need to configure the policy for clearing the recycle bin directory for the new user.

For clusters that use decoupled storage and compute, configure a lifecycle policy for the related directories by referring to this section. Otherwise, the storage space may be used up and storage fees may increase. For details about OBS billing, see OBS Billing Overview.

You need to configure lifecycle policies for the recycle bin directories of preset users in the MRS cluster and the recycle bin directories of new users who need accidental deletion prevention. If a low privileged agency is used or only the permission for MRS users to access OBS file system directories is configured by referring to Configuring Fine-Grained OBS Access Permissions for MRS Cluster Users, you will need the operation permission for the recycle bin directory.

Table 1 Directories for which a lifecycle policy needs to be configured

Cluster Version

Directory Type

Component

Directory

How to Create

Versions earlier than MRS 3.3.0-LTS

Recycle bin directories that must be configured by default for each component in an MRS cluster

Hive

  • user/omm/.Trash
  • user/hive/.Trash

If the .Trash folder does not exist, create it on the cluster client as user omm.

Run the following command:

hdfs dfs -mkdir -p obs://Name of the OBS parallel file system where the table is stored/Folder path

Spark

  • user/omm/.Trash
  • user/root/.Trash
  • user/spark2x/.Trash

HetuEngine

  • user/omm/.Trash
  • user/hetuserver/.Trash

HBase

  • user/hbase/.Trash
  • user/omm/.Trash

Recycle bin directories of users who need accidental deletion prevention

Hive/Spark/HetuEngine

user/<New service user>/.Trash

MRS 3.3.0-LTS or later

Default recycle bin directories configured for each component in an MRS cluster

Hive/Spark/HetuEngine

/user/.Trash

For example, if a user with the following permissions has been added to the cluster, you need to create a recycle bin directory clearing rule for the user in the parallel file system:

  • Permissions to delete the HDFS files
  • DROP, INSERT OVERWRITE, and TRUNCATE permissions on Hive tables
  • DROP, TRUNCATE, DELETE, INSERT OVERWRITE, and LOAD OVERWRITE permissions on HetuEngine

Configuring the Lifecycle Rule of an OBS Directory

  1. Log in to the OBS console.
  2. Click Parallel File Systems and click the name of the file system used by the current MRS cluster.
  3. In the navigation pane, choose Data Management > Lifecycle Rules. Click Create to create a lifecycle rule for a specified directory. For details about the parameters, see Configuring a Lifecycle Rule.

    Table 2 Parameters for creating a lifecycle rule

    Parameter

    Description

    Example Value

    Status

    Whether to enable the lifecycle rule.

    Enable

    Rule Name

    Rule name that identifies different lifecycle configurations.

    rule-test

    Prefix

    Prefix of the objects to which the lifecycle rule applies. Objects that have the specified prefix will be managed by the lifecycle rule. The prefix cannot start with a slash (/), have consecutive slashes (/), or contain the following special characters: \:*?"<>| If this parameter is not specified, the rule will take effect for the entire file system.

    WARNING:

    To prevent other service data from being deleted by mistake, you are not advised to use the lifecycle rule configured for the entire file system or high-level directories.

    Generally, the recycle bin directory of MRS components is in the following format. If the folder does not exist, create it.

    user/<Username>/.Trash

    user/omm/.Trash

    Delete Files After (Days)

    The object within the rule configuration scope expires and is automatically deleted by OBS if the number of days since its last update reaches this parameter value. The recommended value is 1 to 7 days.

    2 days

  4. Click OK to complete the lifecycle rule configuration.

    You can click Edit in the Operation column of a lifecycle rule to edit it. You can also click Disable or Enable to disable or enable it.

  5. Repeat the preceding steps to create recycle bin directory clearing rules for all users who have the data deletion permission in the current MRS cluster one by one until all recycle bin directories in the OBS file system are configured.