Updated on 2025-02-22 GMT+08:00

CLEANARCHIVE

Function

This command is used to clean up archive files of Hudi tables to reduce data storage and read/write pressure on Hudi tables.

Syntax

  • To clean up based on file size, you need to configure parameters:
    hoodie.archive.file.cleaner.policy = KEEP_ARCHIVED_FILES_BY_SIZE;
    hoodie.archive.file.cleaner.size.retained = 5368709120;

    Submitting SQL statements

    run cleanarchive on tableIdentifier/tablelocation;
  • To clean up based on retention time, you need to configure parameters:
    hoodie.archive.file.cleaner.policy = KEEP_ARCHIVED_FILES_BY_DAYS;
    hoodie.archive.file.cleaner.days.retained = 30;

    Submitting SQL statements

    run cleanarchive on tableIdentifier/tablelocation;

Parameter Description

Table 1 Parameter descriptions

Parameter

Description

tableIdentifier

Name of the Hudi table

tablelocation

Storage path of the Hudi table

hoodie.archive.file.cleaner.policy

Policy for clearing archived files: Currently, only the KEEP_ARCHIVED_FILES_BY_SIZE and KEEP_ARCHIVED_FILES_BY_DAYS policies are supported. The default policy is KEEP_ARCHIVED_FILES_BY_DAYS.

  • KEEP_ARCHIVED_FILES_BY_SIZE: used to configure the storage capacity that can be used by archived files.
  • KEEP_ARCHIVED_FILES_BY_DAYS: used to delete archived files beyond a specified time point.

hoodie.archive.file.cleaner.size.retained

When the deletion policy is KEEP_ARCHIVED_FILES_BY_SIZE, this parameter specifies the number of bytes of archived files to be retained. The default value is 5368709120 bytes (5 GB).

hoodie.archive.file.cleaner.days.retained

When the deletion policy is KEEP_ARCHIVED_FILES_BY_DAYS, this parameter specifies the number of days for storing archived files. The default value is 30 days.

Caveats

  • Archived files do not have backups and cannot be restored after deletion.
  • When using the metadata service provided by DLI, this command does not support OBS paths.

System Response

You can check whether the job status is successful, and view the job log to confirm whether there is any exception.