Deleting Historical Data
Scenario
Delete old data from Hudi tables to reduce space occupation and save storage costs.
Running the delete/drop partition Statement
The delete/drop paritition command can be used to delete historical data. For details, see Hudi SQL Syntax Reference.
Advantages: The operation is simple and COW and MOR tables are supported.
Disadvantages: The concurrency is low. When Hudi tables are in the real-time write state, concurrent execution of the delete/drop partition command may cause the real-time data import job to fail.
Running the call clean_data Command
- Function
The call clean_data is used to delete historical data from MOR tables.
Advantages: The deletion operation can be executed concurrently with the data import task, which does not affect the real-time import of data.
Disadvantages: Only MOR tables are supported, and lazy deletion depends on compaction.
- Syntax
call clean_data(table => 'table_name', sql => 'delete statement')
- Parameters
Table 1 Parameters Parameter
Description
table_name
Name of the table whose data is to be deleted. The value can be in the database.tablename format.
delete statement
SQL statement of the select type, which is used to find the data to be deleted.
- Example
Delete all data whose primaryKey is smaller than 100 from the mytable table:
call clean_data(table => 'mytable', sql=>'select * from mytable where primaryKey < 100')
Clear the residual files of the clean_data command. If the clean_data command fails to execute, temporary files are generated. This command can be used to clear these temporary files.
call clean_data(table => 'mytable', sql=>'delete cleanData')
- System response
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot