Reading the Hudi MOR Table View
After the MOR table is synchronized to Hive, the following two tables are synchronized to Hive: Table name_rt and Table name_ro. The table suffixed with rt indicates the real-time view, and the table suffixed with ro indicates the read-optimized view. For example, if the hudi table ${table_name} is synchronized to Hive, two extra tables ${table_name}_rt and ${table_name}_ro are generated in the Hive table after synchronization.
- Reading the real-time view (using Hive and SparkSQL as an example): Directly read the Hudi table with suffix _rt stored in Hive.
select count(*) from ${table_name}_rt;
- Reading the real-time view (using the Spark DataSource API as an example): The operations are the same as those for the COW table. For details, see the operations for the COW table.
- Reading the incremental view (using Hive as an example):
set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat; // This parameter does not need to be specified for SparkSQL. set hoodie.${table_name}.consume.mode=INCREMENTAL; set hoodie.${table_name}.consume.max.commits=3; set hoodie.${table_name}.consume.start.timestamp=20201227153030; select count(*) from default.${table_name}_rt where `_hoodie_commit_time`>'20201227153030';
- Reading the incremental view (using Spark SQL as an example):
set hoodie.${table_name}.consume.mode=INCREMENTAL; set hoodie.${table_name}.consume.start.timestamp=20201227153030; // Specify the initial commit to pull incremental views. set hoodie.${table_name}.consume.end.timestamp=20210308212318; // Specify the end commit to pull incremental views. If this parameter is not specified, the latest commit is used. select count(*) from default.${table_name}_rt where `_hoodie_commit_time`>'20201227153030';
- Incremental view (using the Spark DataSource API as an example): The operations are the same as those for the COW table. For details, see the operations for the COW table.
- Reading the read-optimized view (using Hive and SparkSQL as an example): Directly read the Hudi table with suffix _ro stored in Hive.
select count(*) from ${table_name}_ro;
- Reading the read-optimized view (using the Spark DataSource API as an example): This is similar to reading a common DataSource table.
QUERY_TYPE_OPT_KEY must be set to QUERY_TYPE_READ_OPTIMIZED_OPT_VAL.
spark.read.format("hudi") .option(QUERY_TYPE_OPT_KEY, QUERY_TYPE_READ_OPTIMIZED_OPT_VAL) // Set the query type to the read-optimized view. .load("/tmp/default/mor_bugx/") // Specify the path of the Hudi table to read. .createTempView("mycall") spark.sql("select * from mycall").show(100)
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.