Configuring the Policy for Clearing Component Data in the Recycle Bin
Scenario
By default, components in an MRS 3.2.0-LTS.1 or later cluster support prevention against accidental data deletion. Native HDFS garbage collection can be used in the Hadoop big data systems that use OBS.
The file data deleted by a component user is not directly deleted, but is stored in the recycle bin of the OBS file system instead. This section describes how to set a lifecycle rule for the recycle bin directory to periodically clear related data.
- For clusters that use decoupled storage and compute, configure a lifecycle policy for the related directories by referring to this chapter. Otherwise, the storage space may be used up and storage fees may increase.
- The recycle bin directory is created per user. When a user is created in the MRS cluster and the user has the permission to delete component data, you need to configure the recycle bin clearing rule for this new user.
- For HBase components that use decoupled storage and compute in MRS 3.1.2 or later versions, refer to this topic to set a policy for clearing component data in the recycle bin.
You need to configure lifecycle policies for the recycle bin directories of preset users in the MRS cluster and the recycle bin directories of new users who need accidental deletion prevention. If a low privileged agency is used or only the permission for MRS users to access OBS file system directories is configured by referring to Configuring Fine-Grained Permissions for MRS Multi-User Access to OBS, you will need the operation permission for the recycle bin directory.
Cluster Version |
Directory Type |
Component |
Directory |
How to Create |
---|---|---|---|---|
Versions earlier than MRS 3.3.0-LTS |
Recycle bin directories that must be configured by default for each component in an MRS cluster |
Hive |
|
If the .Trash folder does not exist, create it on the cluster client as user omm. Run the following command: hdfs dfs -mkdir -p obs://Name of the OBS parallel file system where the table is stored/Folder path |
Spark |
|
|||
HetuEngine |
|
|||
HBase |
|
|||
Recycle bin directories of users who need accidental deletion prevention |
Hive/Spark/HetuEngine |
user/<New service user>/.Trash |
||
MRS 3.3.0-LTS or later |
Default recycle bin directories configured for each component in an MRS cluster |
Hive/Spark/HetuEngine |
/user/.Trash |
For example, if a new user in the cluster has the following permissions, you need to create a recycle bin directory clearing rule for the user in the parallel file system:
- Permissions to delete the HDFS files
- DROP, INSERT OVERWRITE, and TRUNCATE permissions on Hive tables
- DROP, TRUNCATE, DELETE, INSERT OVERWRITE, and LOAD OVERWRITE permissions on HetuEngine
Configuring the Lifecycle Rule of an OBS Directory
- Log in to the OBS console.
- Click Parallel File Systems and click the name of the file system used by the current MRS cluster.
- In the navigation pane on the left, choose Basic Configurations > Lifecycle Rules. Click Create to create a lifecycle rule for a specified directory. For details about the parameters, see Configuring a Lifecycle Rule.
Table 2 Parameters for creating a lifecycle rule Name
Description
Example Value
Status
Whether to enable the lifecycle rule.
Enable
Rule Name
Rule name that identifies different lifecycle configurations.
rule-test
Prefix
Prefix of the objects to which the lifecycle rule applies. Objects that have the specified prefix will be managed by the lifecycle rule. The prefix cannot start with a slash (/), have consecutive slashes (/), or contain the following special characters: \:*?"<>| If this parameter is not specified, the rule will take effect for the entire file system.
NOTE:To prevent other service data from being deleted by mistake, you are not advised to use the lifecycle rule configured for the entire file system or high-level directories.
Generally, the recycle bin directory of MRS components is in the following format. If the folder does not exist, create it.
user/<Username>/.Trash
user/omm/.Trash
Delete Files After (Days)
The object within the rule configuration scope expires and is automatically deleted by OBS if the number of days since its last update reaches this parameter value.
30 days
- Click OK to complete the lifecycle rule configuration.
You can click Edit in the Operation column of a lifecycle rule to edit it. You can also click Disable or Enable to disable or enable it.
- Repeat the preceding steps to create recycle bin directory clearing rules for all users who have the data deletion permission in the current MRS cluster one by one until all recycle bin directories in the OBS file system are configured.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.