Historical Hudi Data Deletion
This topic is available for MRS 3.3.0-LTS and later versions only.
Scenario
Delete old data from Hudi tables to reduce space occupation and save storage costs.
Running delete/drop partition
The delete/drop partition command can be used to delete historical data. For details, see Hudi SQL Syntax Reference.
Advantages: The operation is simple and COW and MOR tables are supported.
Disadvantages: The concurrency is low. When Hudi tables are in the real-time write state, concurrent execution of the delete/drop partition command may cause the real-time data import job to fail.
Running call clean_data
- Function
The call clean_data is used to delete historical data from MOR tables.
Advantages: The deletion operation can be executed concurrently with the data import task, which does not affect the real-time import of data.
Disadvantages: Only MOR tables are supported, and lazy deletion depends on compaction.
- Syntax
call clean_data(table => 'table_name', sql => 'delete statement')
- Parameter description
Table 1 Parameter description Parameter
Description
table_name
Name of the table whose data is to be deleted. The value can be in the database.tablename format.
delete statement
SQL statement of the select type, which is used to find the data to be deleted.
- Example
Delete all data whose primaryKey is smaller than 100 from the mytable table:
call clean_data(table => 'mytable', sql=>'select * from mytable where primaryKey < 100')
Clear the residual files of the clean_data command. If cleanData fails, temporary files are generated. The following command can be used to clear these temporary files:
call clean_data(table => 'mytable', sql=>'delete cleanData')
- Response
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot