Reading the Hudi COW Table View

Reading the real-time view (using Hive and SparkSQL as an example): Directly read the Hudi table stored in Hive and use ${table_name} to specify the table name.
```
select count(*) from ${table_name};
```

Reading the real-time view (using the Spark DataSource API as an example): This is similar to reading a common DataSource table.

The query type QUERY_TYPE_OPT_KEY must be set to QUERY_TYPE_SNAPSHOT_OPT_VAL. Use ${table_name} to specify the table name.

spark.read.format("hudi")
.option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_SNAPSHOT_OPT_VAL) // Set the query type to the real-time view.
.load("/tmp/default/cow_bugx/") // Specify the path of the Hudi table to read.
.createTempView("mycall")
spark.sql("select * from mycall").show(100)

Reading the incremental view (using Hive as an example and ${table_name} to specify the table name.)

set hoodie.${table_name}.consume.mode=INCREMENTAL;  //Set incremental read.
set hoodie.${table_name}.consume.max.commits=3;  // Specify the maximum number of commits to be consumed.
set hoodie.${table_name}.consume.start.timestamp=20201227153030;  // Specify the initial commit to pull incremental views.
select count(*) from default.${table_name} where `_hoodie_commit_time`>'20201227153030'; // This filtering condition must be added, and the value is the initial commit to pull incremental views.

Reading the incremental view (using SparkSQL as an example and ${table_name} to specify the table name.)

set hoodie.${table_name}.consume.mode=INCREMENTAL;  //Set incremental read.
set hoodie.${table_name}.consume.start.timestamp=20201227153030;  // Specify the initial commit to pull incremental views.
set hoodie.${table_name}.consume.end.timestamp=20210308212318;  // Specify the end commit to pull incremental views. If this parameter is not specified, the latest commit is used.
select count(*) from default.${table_name} where `_hoodie_commit_time`>'20201227153030'; // This filtering condition must be added, and the value is the initial commit to pull incremental views.

Reading the incremental view (using the Spark DataSource API as an example):

QUERY_TYPE_OPT_KEY must be set to QUERY_TYPE_INCREMENTAL_OPT_VAL.

spark.read.format("hudi")  
.option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL) // Set the query type to the incremental mode.
.option(BEGIN_INSTANTTIME_OPT_KEY, "20210308212004")  // Specify the initial incremental pull commit.
.option(END_INSTANTTIME_OPT_KEY, "20210308212318")  //: Specify the end commit of the incremental pull.
.load("/tmp/default/cow_bugx/")  // Specify the path of the Hudi table to read.
.createTempView("mycall")  // Register as a Spark temporary table.
spark.sql("select * from mycall where `_hoodie_commit_time`>'20210308211131'")// Start the query. The statement is the same as the Hive incremental query statement.
.show(100, false)

Reading the read-optimized view: The read-optimized view of COW tables is equivalent to the real-time view.

Parent topic: Hudi Read Operation

Previous topic: Read Hudi Data

Next topic: Reading the Hudi MOR Table View

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot