Reading MOR Table Views
After the MOR table is synchronized to Hive, the following two tables are synchronized to Hive: Table name_rt and Table name_ro. The table suffixed with rt indicates the real-time view, and the table suffixed with ro indicates the read-optimized view. For example, the name of the Hudi table to be synchronized to Hive is test. After the table is synchronized to Hive, two more tables test_rt and test_ro are generated in the Hive table.
- Reading the real-time view (using Hive and SparkSQL as an example): Directly read the Hudi table with suffix _rt stored in Hive.
select count(*) from test_rt;
- Reading the real-time view (using the Spark DataSource API as an example): The operations are the same as those for the COW table. For details, see the operations for the COW table.
- Reading the incremental view (using Hive as an example):
set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat; // This parameter does not need to be specified for SparkSQL. set hoodie.test.consume.mode=INCREMENTAL; set hoodie.test.consume.max.commits=3; set hoodie.test.consume.start.timestamp=20201227153030; select count(*) from default.test_rt where `_hoodie_commit_time`>'20201227153030'; // Results must be filtered by start.timestamp and end.timestamp. If end.timestamp is not specified, only start.timestamp is required for filtering.
- Reading the incremental view (using Spark SQL as an example):
set hoodie.test.consume.mode=INCREMENTAL; set hoodie.test.consume.start.timestamp=20201227153030; // Specify the initial incremental pull commit. set hoodie.test.consume.end.timestamp=20210308212318; // Specify the end commit of the incremental pull. If this parameter is not specified, the latest commit is used. select count(*) from test_rt where `_hoodie_commit_time`>'20201227153030' and `_hoodie_commit_time`<='20210308212318'; // Results must be filtered by start.timestamp and end.timestamp. If end.timestamp is not specified, only start.timestamp is required for filtering.
- Incremental view (using the Spark DataSource API as an example): The operations are the same as those for the COW table. For details, see the operations for the COW table.
- Reading the read-optimized view (using Hive and SparkSQL as an example): Directly read the Hudi table with suffix _ro stored in Hive.
select count(*) from test_ro;
- Reading the read-optimized view (using the Spark DataSource API as an example): This is similar to reading a common DataSource table.
QUERY_TYPE_OPT_KEY must be set to QUERY_TYPE_READ_OPTIMIZED_OPT_VAL.
spark.read.format("hudi") .option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_READ_OPTIMIZED_OPT_VAL) // Set the query type to the read-optimized view. .load("/tmp/default/mor_bugx/") // Specify the path of the Hudi table to read. .createTempView("mycall") spark.sql("select * from mycall").show(100)
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot