Parameter Specifications for Incremental Reading of Hudi Table Data with Spark

Rules

Before performing an incremental query, you must set the current table's query mode to incremental query and reset the table's query mode after the query is completed.

If the table's query mode is not reset after the incremental query, subsequent real-time queries will be affected.

Example

The following uses a SQL job as an example:

Set parameters.

hoodie.tableName.consume.mode=INCREMENTAL // Set the current table to be read in incremental mode.
hoodie.tableName.consume.start.timestamp=20201227153030 // Specify the initial incremental pull commit.
hoodie.tableName.consume.end.timestamp=20210308212318 // Specify the incremental pull end commit. If not specified, the latest commit is used.

Run the following SQL statement:

select * from tableName where `_hoodie_commit_time`>'20201227153030' and `_hoodie_commit_time`<='20210308212318'; // The results must be filtered based on start.timestamp and end.timestamp. If end.timestamp is not specified, then filtering should only be done based on start.timestamp.

When submitting other SQL statements, you need to clear the preceding configuration parameters to avoid affecting the execution results of other tasks.

Parent topic: Spark on Hudi Development Specifications

Previous topic: Parameter Specifications for Creating a Hudi Table with SparkSQL

Next topic: Parameter Specifications for Spark Asynchronous Task Execution Table Compaction