Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using Hive/ Enterprise-Class Enhancements of Hive/ Configuring Automatic Removal of Old Data from the Hive Directory to the Recycle Bin
Updated on 2025-08-22 GMT+08:00

Configuring Automatic Removal of Old Data from the Hive Directory to the Recycle Bin

Scenario

After the function of automatically removing old data from Hive directories to the recycle bin is enabled, executing the insert overwrite directory "/path1" ... command will remove the old data to the recycle bin instead of deleting it. Additionally, Hive restricts the directory from being an existing database path in the Hive metabase.

The function is mainly used for data deletion, restoration of deleted data, data migration, and storage space management. The recycle bin function allows you to flexibly manage old data while ensuring data security and compliance, without data loss or storage space waste.

Procedure

  1. Log in to FusionInsight Manager, choose Cluster > Services > Hive, click Configurations, and click All Configurations.
  2. Choose HiveServer(Role) > Customization, add a custom parameter to the hive-site.xml file, and set Name to hive.overwrite.directory.move.trash. The parameter values can be:

    • true: After Hive executes the insert overwrite operation, the old data is moved to the recycle bin instead of being directly deleted.
    • false (default value): After Hive executes the insert overwrite operation, the old data in the target directory is directly deleted.

    Set this parameter to true.

  3. Click Save to save the settings. Click Instances, select all Hive instances, click More then Restart Instance, enter the user password, and click OK to restart all Hive instances.
  4. Log in to the node where the client is installed as the client installation user.

    For details about how to download and install the cluster client, see Installing an MRS Cluster Client.

  5. Run the following commands to configure environment variables and authenticate the user:

    Go to the client installation directory.

    cd Client installation directory

    Load the environment variables.

    source bigdata_env

    Authenticate the user. Skip this step if Kerberos authentication is disabled for the cluster (in normal mode).

    kinit Hive service user

  6. Create an HDFS directory, for example, /user/test.

    hdfs dfs -mkdir /user/test

  7. Log in to the Hive client.

    beeline

  8. Create a Hive table, for example, test, and insert data into the table.

    Create a table, for example, test.
    create table test(id int,name string);
    Insert data into the table.
    insert into table test(id,name) values("11","A");

  9. Query the Hive table data and write the data to a directory in the HDFS, for example, /user/test.

    insert overwrite directory '/user/test' select * from test;

    Query the column data of the Hive table and write the data to HDFS.

    insert overwrite directory '/user/test' select id from test;

    After the command is executed successfully, the data written to the /user/test directory for the first time is not deleted but removed to a file of .Trash directory in the recycle bin, for example, /user/Username/.Trash/xxx/user/test/000000_0. The content is as follows:

    11A