Updated on 2024-05-29 GMT+08:00

CLEANARCHIVE

Function

Deletes the archive files of Hudi tables to reduce data storage and read/write pressure of Hudi tables.

Syntax

set hoodie.archive.file.cleaner.policy = KEEP_ARCHIVED_FILES_BY_SIZE;

set hoodie.archive.file.cleaner.size.retained = 5368709120;

run cleanarchive on tableIdentifier/tablelocation;

set hoodie.archive.file.cleaner.policy = KEEP_ARCHIVED_FILES_BY_DAYS;

set hoodie.archive.file.cleaner.days.retained = 30;

run cleanarchive on tableIdentifier/tablelocation;

Parameter Description

Table 1 Parameters

Parameter

Description

tableIdentifier

Name of the Hudi table

tablelocation

Storage path of the Hudi table

hoodie.archive.file.cleaner.policy

Policy for clearing archived files: Currently, only the KEEP_ARCHIVED_FILES_BY_SIZE and KEEP_ARCHIVED_FILES_BY_DAYS policies are supported. The default policy is KEEP_ARCHIVED_FILES_BY_DAYS.

  • KEEP_ARCHIVED_FILES_BY_SIZE: used to configure the storage capacity that can be used by archived files.
  • KEEP_ARCHIVED_FILES_BY_DAYS: used to delete archived files beyond a specified time point.

hoodie.archive.file.cleaner.size.retained

When the deletion policy is KEEP_ARCHIVED_FILES_BY_SIZE, this parameter specifies the number of bytes of archived files to be retained. The default value is 5368709120 bytes (5 GB).

hoodie.archive.file.cleaner.days.retained

When the deletion policy is KEEP_ARCHIVED_FILES_BY_DAYS, this parameter specifies the number of days for storing archived files. The default value is 30 days.

Precautions

Archived files are not backed up and cannot be restored after being deleted.

System Response

You can view command execution results in the driver log or on the client.