Updated on 2024-04-18 GMT+08:00

Configuring the Policy for Clearing Component Data in the Recycle Bin

Scenario

By default, components in an MRS 3.2.0-LTS.1 or later cluster support prevention against accidental data deletion. Native HDFS garbage collection can be used in the Hadoop big data systems that use OBS.

The file data deleted by a component user is not directly deleted, but is stored in the recycle bin of the OBS file system instead. This section describes how to set a lifecycle rule for the recycle bin directory to periodically clear related data.

  • For clusters that use decoupled storage and compute, configure a lifecycle policy for the related directories by referring to this chapter. Otherwise, the storage space may be used up and storage fees may increase. For details about OBS billing, see OBS Billing Overview.
  • The recycle bin directory is created per user. When a user is created in the MRS cluster and the user has the permission to delete component data, you need to configure the recycle bin clearing rule for this new user.
  • For HBase components that use decoupled storage and compute in MRS 3.1.2 or later versions, refer to this topic to set a policy for clearing component data in the recycle bin.

You need to configure lifecycle policies for the recycle bin directories of preset users in the MRS cluster and the recycle bin directories of new users who need accidental deletion prevention. If a low privileged agency is used or only the permission for MRS users to access OBS file system directories is configured by referring to Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS, you will need the operation permission for the recycle bin directory.

Table 1 Directories for which a lifecycle policy needs to be configured

Cluster Version

Directory Type

Component

Directory

How to Create

Versions earlier than MRS 3.3.0-LTS

Recycle bin directories that must be configured by default for each component in an MRS cluster

Hive

  • user/omm/.Trash
  • user/hive/.Trash

If the .Trash folder does not exist, create it on the cluster client as user omm.

Run the following command:

hdfs dfs -mkdir -p obs://Name of the OBS parallel file system where the table is stored/Folder path

Spark

  • user/omm/.Trash
  • user/root/.Trash
  • user/spark2x/.Trash

HetuEngine

  • user/omm/.Trash
  • user/hetuserver/.Trash

HBase

  • user/hbase/.Trash
  • user/omm/.Trash

Recycle bin directories of users who need accidental deletion prevention

Hive/Spark/HetuEngine

user/<New service user>/.Trash

MRS 3.3.0-LTS or later

Default recycle bin directories configured for each component in an MRS cluster

Hive/Spark/HetuEngine

/user/.Trash

For example, if a new user in the cluster has the following permissions, you need to create a recycle bin directory clearing rule for the user in the parallel file system:

  • Permissions to delete the HDFS files
  • DROP, INSERT OVERWRITE, and TRUNCATE permissions on Hive tables
  • DROP, TRUNCATE, DELETE, INSERT OVERWRITE, and LOAD OVERWRITE permissions on HetuEngine

Configuring the Lifecycle Rule of an OBS Directory

  1. Log in to the OBS console.
  2. Click Parallel File Systems and click the name of the file system used by the current MRS cluster.
  3. In the navigation pane on the left, choose Basic Configurations > Lifecycle Rules. Click Create to create a lifecycle rule for a specified directory. For details about the parameters, see Configuring a Lifecycle Rule.

    Table 2 Parameters for creating a lifecycle rule

    Name

    Description

    Example Value

    Status

    Whether to enable the lifecycle rule.

    Enable

    Rule Name

    Rule name that identifies different lifecycle configurations.

    rule-test

    Prefix

    Prefix of the objects to which the lifecycle rule applies. Objects that have the specified prefix will be managed by the lifecycle rule. The prefix cannot start with a slash (/), have consecutive slashes (/), or contain the following special characters: \:*?"<>| If this parameter is not specified, the rule will take effect for the entire file system.

    NOTE:

    To prevent other service data from being deleted by mistake, you are not advised to use the lifecycle rule configured for the entire file system or high-level directories.

    Generally, the recycle bin directory of MRS components is in the following format. If the folder does not exist, create it.

    user/<Username>/.Trash

    user/omm/.Trash

    Delete Files After (Days)

    The object within the rule configuration scope expires and is automatically deleted by OBS if the number of days since its last update reaches this parameter value.

    30 days

  4. Click OK to complete the lifecycle rule configuration.

    You can click Edit in the Operation column of a lifecycle rule to edit it. You can also click Disable or Enable to disable or enable it.

  5. Repeat the preceding steps to create recycle bin directory clearing rules for all users who have the data deletion permission in the current MRS cluster one by one until all recycle bin directories in the OBS file system are configured.