On this page

Show all

Cleaning

Updated on 2022-11-18 GMT+08:00

Cleaning is used to delete data of versions that are no longer required.

Hudi uses the cleaner working in the background to continuously delete unnecessary data of old versions. You can configure hoodie.cleaner.policy and hoodie.cleaner.commits.retained to use different cleaning policies and determine the number of saved commits.

You can use either of the following methods to perform cleaning:

  • Using Hudi CLI

    cleans run --sparkMaster yarn --hoodieConfigs 'hoodie.cleaner.policy=KEEP_LATEST_COMMITS,hoodie.cleaner.commits.retained=1,hoodie.cleaner.incremental.mode=false,hoodie.keep.max.commits=3,hoodie.keep.min.commits=2'

  • Using APIs

    spark-submit --master yarn --jars /opt/client/Hudi/hudi/lib/hudi-client-common-xxx.jar --class org.apache.hudi.utilities.HoodieCleaner /opt/client/Hudi/hudi/lib/hudi-utilities_xxx.jar --target-base-path /tmp/default/tb_test_mor

For details about more cleaning parameters, see Configuration Reference.

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback