Parameter Specifications for Incremental Reading of Hudi Table Data with Spark
Rules
Before performing an incremental query, you must set the current table's query mode to incremental query and reset the table's query mode after the query is completed.
If the table's query mode is not reset after the incremental query, subsequent real-time queries will be affected.
Example
The following uses a SQL job as an example:
Set parameters.
hoodie.tableName.consume.mode=INCREMENTAL // Set the current table to be read in incremental mode. hoodie.tableName.consume.start.timestamp=20201227153030 // Specify the initial incremental pull commit. hoodie.tableName.consume.end.timestamp=20210308212318 // Specify the incremental pull end commit. If not specified, the latest commit is used.
Run the following SQL statement:
select * from tableName where `_hoodie_commit_time`>'20201227153030' and `_hoodie_commit_time`<='20210308212318'; // The results must be filtered based on start.timestamp and end.timestamp. If end.timestamp is not specified, then filtering should only be done based on start.timestamp.
When submitting other SQL statements, you need to clear the preceding configuration parameters to avoid affecting the execution results of other tasks.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot