Help Center/
MapReduce Service/
Component Operation Guide (LTS) (Ankara Region)/
Using Hudi/
Basic Operations/
Read/
Reading COW Table Views
Updated on 2024-11-29 GMT+08:00
Reading COW Table Views
- Reading the real-time view (using Hive and SparkSQL as an example): Directly read the Hudi table stored in Hive.
select count(*) from test;
- Reading the real-time view (using the Spark DataSource API as an example): This is similar to reading a common DataSource table.
QUERY_TYPE_OPT_KEY must be set to QUERY_TYPE_SNAPSHOT_OPT_VAL.
spark.read.format("hudi") .option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_SNAPSHOT_OPT_VAL) // Set the query type to the real-time view. .load("/tmp/default/cow_bugx/") // Specify the path of the Hudi table to read. .createTempView("mycall") spark.sql("select * from mycall").show(100)
- Reading the incremental view (using Hive as an example):
set hoodie.test.consume.mode=INCREMENTAL; // Specify the incremental reading mode. set hoodie.test.consume.max.commits=3; // Specify the maximum number of commits to be consumed. set hoodie.test.consume.start.timestamp=20201227153030; // Specify the initial incremental pull commit. select count(*) from default.test where `_hoodie_commit_time`>'20201227153030'; // Results must be filtered by start.timestamp and end.timestamp. If end.timestamp is not specified, only start.timestamp is required for filtering.
- Reading the incremental view (using Spark SQL as an example):
set hoodie.test.consume.mode=INCREMENTAL; // Specify the incremental reading mode. set hoodie.test.consume.start.timestamp=20201227153030; // Specify the initial incremental pull commit. set hoodie.test.consume.end.timestamp=20210308212318; // Specify the end commit of the incremental pull. If this parameter is not specified, the latest commit is used. select count(*) from test_rt where `_hoodie_commit_time`>'20201227153030' and `_hoodie_commit_time`<='20210308212318'; // Results must be filtered by start.timestamp and end.timestamp. If end.timestamp is not specified, only start.timestamp is required for filtering.
- Reading the incremental view (using the Spark DataSource API as an example):
QUERY_TYPE_OPT_KEY must be set to QUERY_TYPE_INCREMENTAL_OPT_VAL.
spark.read.format("hudi") .option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_INCREMENTAL_OPT_VAL) // Set the query type to the incremental mode. .option(BEGIN_INSTANTTIME_OPT_KEY, "20210308212004") // Specify the initial incremental pull commit. .option(END_INSTANTTIME_OPT_KEY, "20210308212318") //: Specify the end commit of the incremental pull. .load("/tmp/default/cow_bugx/") // Specify the path of the Hudi table to read. .createTempView("mycall") // Register as a Spark temporary table. spark.sql("select * from mycall where `_hoodie_commit_time`>'20210308212004' and `_hoodie_commit_time`<='20210308212318'").show(100, false) // Results must be filtered by START_INSTANTTIME and END_INSTANTTIME. If END_INSTANTTIME is not specified, only START_INSTANTTIME is required for filtering.
- Reading the read-optimized view: The read-optimized view of COW tables is equivalent to the real-time view.
Parent topic: Read
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot