Reading the Hudi MOR Table View
After the MOR table is synchronized to Hive, the following two tables are synchronized to Hive: Table name_rt and Table name_ro. The table suffixed with rt indicates the real-time view, and the table suffixed with ro indicates the read-optimized view. For example, if the hudi table ${table_name} is synchronized to Hive, two extra tables ${table_name}_rt and ${table_name}_ro are generated in the Hive table after synchronization.
- Reading the real-time view (using Hive and SparkSQL as an example): Directly read the Hudi table with suffix _rt stored in Hive.
select count(*) from ${table_name}_rt;
- Reading the real-time view (using the Spark DataSource API as an example): The operations are the same as those for the COW table. For details, see the operations for the COW table.
- Reading the incremental view (using Hive as an example):
set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat; // This parameter does not need to be specified for SparkSQL. set hoodie.${table_name}.consume.mode=INCREMENTAL; set hoodie.${table_name}.consume.max.commits=3; set hoodie.${table_name}.consume.start.timestamp=20201227153030; select count(*) from default.${table_name}_rt where `_hoodie_commit_time`>'20201227153030';
- Reading the incremental view (using Spark SQL as an example):
set hoodie.${table_name}.consume.mode=INCREMENTAL; set hoodie.${table_name}.consume.start.timestamp=20201227153030; // Specify the initial commit to pull incremental views. set hoodie.${table_name}.consume.end.timestamp=20210308212318; // Specify the end commit to pull incremental views. If this parameter is not specified, the latest commit is used. select count(*) from default.${table_name}_rt where `_hoodie_commit_time`>'20201227153030';
- Incremental view (using the Spark DataSource API as an example): The operations are the same as those for the COW table. For details, see the operations for the COW table.
- Reading the read-optimized view (using Hive and SparkSQL as an example): Directly read the Hudi table with suffix _ro stored in Hive.
select count(*) from ${table_name}_ro;
- Reading the read-optimized view (using the Spark DataSource API as an example): This is similar to reading a common DataSource table.
QUERY_TYPE_OPT_KEY must be set to QUERY_TYPE_READ_OPTIMIZED_OPT_VAL.
spark.read.format("hudi") .option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_READ_OPTIMIZED_OPT_VAL) // Set the query type to the read-optimized view. .load("/tmp/default/mor_bugx/") // Specify the path of the Hudi table to read. .createTempView("mycall") spark.sql("select * from mycall").show(100)
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot