CLEANARCHIVE
Function
This command is used to clean up archive files of Hudi tables to reduce data storage and read/write pressure on Hudi tables.
Syntax
- To clean up based on file size, you need to configure parameters:
hoodie.archive.file.cleaner.policy = KEEP_ARCHIVED_FILES_BY_SIZE; hoodie.archive.file.cleaner.size.retained = 5368709120;
Submitting SQL statements
run cleanarchive on tableIdentifier/tablelocation;
- To clean up based on retention time, you need to configure parameters:
hoodie.archive.file.cleaner.policy = KEEP_ARCHIVED_FILES_BY_DAYS; hoodie.archive.file.cleaner.days.retained = 30;
Submitting SQL statements
run cleanarchive on tableIdentifier/tablelocation;
Parameter Description
Parameter |
Description |
---|---|
tableIdentifier |
Name of the Hudi table |
tablelocation |
Storage path of the Hudi table |
hoodie.archive.file.cleaner.policy |
Policy for clearing archived files: Currently, only the KEEP_ARCHIVED_FILES_BY_SIZE and KEEP_ARCHIVED_FILES_BY_DAYS policies are supported. The default policy is KEEP_ARCHIVED_FILES_BY_DAYS.
|
hoodie.archive.file.cleaner.size.retained |
When the deletion policy is KEEP_ARCHIVED_FILES_BY_SIZE, this parameter specifies the number of bytes of archived files to be retained. The default value is 5368709120 bytes (5 GB). |
hoodie.archive.file.cleaner.days.retained |
When the deletion policy is KEEP_ARCHIVED_FILES_BY_DAYS, this parameter specifies the number of days for storing archived files. The default value is 30 days. |
Caveats
- Archived files do not have backups and cannot be restored after deletion.
- When using the metadata service provided by DLI, this command does not support OBS paths.
System Response
You can check whether the job status is successful, and view the job log to confirm whether there is any exception.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.