Updated on 2025-02-22 GMT+08:00

CLEAN_FILE

Function

Cleans invalid data files from the Hudi table directory.

Syntax

call clean_file(table => '[table_name]', mode=>'[op_type]', backup_path=>'[backup_path]', start_instant_time=>'[start_time]', end_instant_time=>'[end_time]');

Parameter Description

Table 1 Parameter descriptions

Parameter

Description

table_name

Mandatory. Name of the Hudi table from which invalid data files are to be deleted.

op_type

Optional. Command running mode. The default value is dry_run. Value options are dry_run, repair, undo, and query.

dry_run: displays invalid data files to be cleaned.

repair: displays and cleans invalid data files.

undo: restores deleted data files.

query: displays the backup directories that have been cleaned.

backup_path

Mandatory. Backup directory of the data files to be restored. This parameter is available only when the running mode is undo.

start_time

Optional. Start time for generating invalid data files. This parameter is available only when the running mode is dry_run or repair. The start time is not limited by default.

end_time

Optional. End time for generating invalid data files. This parameter is available only when the running mode is dry_run or repair. The end time is not limited by default.

Example

call clean_file(table => 'h1', mode=>'repair');
call clean_file(table => 'h1', mode=>'dry_run');
call clean_file(table => 'h1', mode=>'query');
call clean_file(table => 'h1', mode=>'undo', backup_path=>'obs://bucket/hudi/h1/.hoodie/.cleanbackup/hoodie_repair_backup_20230527');

Caveats

The command cleans only invalid Parquet files.

System Response

You can check if the job status is successful, view the job result, and review the job logs to confirm if there are any exceptions.