Operating a Hudi Table Using hudi-cli.sh
Prerequisites
- For a cluster with Kerberos authentication enabled, a user has been created on FusionInsight Manager of the cluster and associated with user groups hadoop and hive.
- The Hudi cluster client has been downloaded and installed.
Basic Operations
- Log in to the cluster client as user root and run the following commands:
cd {Client installation directory}
source bigdata_env
source Hudi/component_env
kinit Created user
- Run the hudi-cli.sh command to access the Hudi client.
cd {Client installation directory}/Hudi/hudi/bin/
./hudi-cli.sh
- Run the following example commands as required.
- Viewing help information
help // View all Hudi CLI commands.
help 'command' // View the help information and parameter list of a certain command.
- Connecting to a table
- Viewing table information
- Viewing compaction plans
- Viewing cleaning plans
- Performing the cleaning operation
- Viewing commit information
- Viewing the partition where the commit is written to
commit showpartitions --commit 20210127153356
20210127153356 indicates the commit timestamp.
- Viewing the file where the commit is written to
- Comparing the commit information of two tables
- Rolling back a commit (Only the last commit can be rolled back.)
- Scheduling a compaction
compaction schedule --hoodieConfigs 'hoodie.compaction.strategy=org.apache.hudi.table.action.compact.strategy.BoundedIOCompactionStrategy,hoodie.compaction.target.io=1,hoodie.compact.inline.max.delta.commits=1'
- Performing a compaction
compaction run --parallelism 100 --sparkMemory 1g --retry 1 --compactionInstant 20210602101315 --hoodieConfigs 'hoodie.compaction.strategy=org.apache.hudi.table.action.compact.strategy.BoundedIOCompactionStrategy,hoodie.compaction.target.io=1,hoodie.compact.inline.max.delta.commits=1' --propsFilePath hdfs://hacluster/tmp/default/tb_test_mor/.hoodie/hoodie.properties --schemaFilePath /tmp/default/tb_test_mor/.hoodie/compact_tb_base.json
- Creating a savepoint
- Rolling back a specified savepoint
savepoint rollback --savepoint 20210318155750
- If the commit operation causes metadata conflicts, you can run the commit rollback and savepoint rollback commands to roll back data, but the Hive metadata cannot be rolled back. In this case, you can delete the Hive table and manually synchronize data.
- The commit rollback command rolls back only the latest commit, and the savepoint rollback command rolls back only the latest savepoint. You cannot specify a commit or savepoint to roll back.
- Viewing help information
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot