Updated on 2024-11-29 GMT+08:00

Operating a Hudi Table Using hudi-cli.sh

Prerequisites

  • For a cluster with Kerberos authentication enabled, a user has been created on FusionInsight Manager of the cluster and associated with user groups hadoop and hive.
  • The Hudi cluster client has been downloaded and installed.

Basic Operations

  1. Log in to the cluster client as user root and run the following commands:

    cd Client installation directory

    source bigdata_env

    source Hudi/component_env

    kinit Created user

  2. Run the hudi-cli.sh command to access the Hudi client.

    cd {Client installation directory}/Hudi/hudi/bin/

    ./hudi-cli.sh

  3. Run the following example commands as required.
    • Viewing help information

      help // View all Hudi CLI commands.

      help 'command' // View the help information and parameter list of a certain command.

    • Connecting to a table

      connect --path '/tmp/huditest/test_table'

    • Viewing table information

      desc

    • Viewing compaction plans

      compactions show all

    • Viewing cleaning plans

      cleans show

    • Performing the cleaning operation

      cleans run

    • Viewing commit information

      commits show

    • Viewing the partition where the commit is written to

      commit showpartitions --commit 20210127153356

      20210127153356 indicates the commit timestamp.

    • Viewing the file where the commit is written to

      commit showfiles --commit 20210127153356

    • Comparing the commit information of two tables

      commits compare --path /tmp/hudimor/mytest100

    • Rolling back a commit (Only the last commit can be rolled back.)

      commit rollback --commit 20210127164905

    • Scheduling a compaction

      compaction schedule --hoodieConfigs 'hoodie.compaction.strategy=org.apache.hudi.table.action.compact.strategy.BoundedIOCompactionStrategy,hoodie.compaction.target.io=1,hoodie.compact.inline.max.delta.commits=1'

    • Performing a compaction

      compaction run --parallelism 100 --sparkMemory 1g --retry 1 --compactionInstant 20210602101315 --hoodieConfigs 'hoodie.compaction.strategy=org.apache.hudi.table.action.compact.strategy.BoundedIOCompactionStrategy,hoodie.compaction.target.io=1,hoodie.compact.inline.max.delta.commits=1' --propsFilePath hdfs://hacluster/tmp/default/tb_test_mor/.hoodie/hoodie.properties --schemaFilePath /tmp/default/tb_test_mor/.hoodie/compact_tb_base.json

    • Creating a savepoint

      savepoint create --commit 20210318155750

    • Rolling back a specified savepoint

      savepoint rollback --savepoint 20210318155750

      1. If the commit operation causes metadata conflicts, you can run the commit rollback and savepoint rollback commands to roll back data, but the Hive metadata cannot be rolled back. In this case, you can delete the Hive table and manually synchronize data.
      2. The commit rollback command rolls back only the latest commit, and the savepoint rollback command rolls back only the latest savepoint. You cannot specify a commit or savepoint to roll back.