Hive Supporting Reading Hudi Tables
Hive External Tables Corresponding to Hudi Tables
A Hudi source table corresponds to a copy of HDFS data. The Hudi table data can be mapped to a Hive external table through the Spark component, Flink component, or Hudi client. Based on the external table, Hive can easily perform real-time query, read-optimized view query, and incremental view query.
- Different view queries are provided for different types of Hudi source tables:
- When the Hudi source table is a Copy-On-Write (COW) table, it can be mapped to a Hive external table. The table supports real-time query and incremental view query.
- When the Hudi source table is a Merge-On-Read (MOR) table, it can be mapped to two Hive external tables (RO table and RT table). The RO table supports read-optimized view query, and the RT table supports real-time view query and incremental view query.
- Hive external tables cannot be added, deleted, or modified (including insert, update, delete, load, merge, alter and msck). Only the query operation (select) is supported.
- Granting table permissions: The update, alter, write, and all permissions cannot be modified.
- Backup and restoration: The RO and RT tables are mapped from the same Hudi source table. When one table is backed up, the other table is also backed up. The same applies to restoration. Therefore, only one table needs to be backed up.
- Component versions:
- Hive: FusionInsight_HD_xxx; Hive kernel version 3.1.0.
- Spark2x: FusionInsight_Spark2x_xxx; Hudi kernel version 0.11.0
Creating Hive External Tables Corresponding to Hudi Tables
Generally, Hudi table data is synchronized to Hive external tables when the data is imported to the lake. In this case, you can directly query the corresponding Hive external tables in Beeline. If the data is not synchronized to the Hive external tables, you can use the Hudi client tool run_hive_sync_tool.sh to synchronize data manually.
Querying Hive External Tables Corresponding to Hudi Tables
Prerequisites
Before using Hive to perform incremental query on Hudi tables, you need to set another three parameters in Table 1. The three parameters are table-level parameters. Each Hudi source table corresponds to three parameters, where hudisourcetablename indicates the name of the Hudi source table (not the name of the Hive external table).
Parameter |
Default Value |
Description |
---|---|---|
hoodie. hudisourcetablename.consume.mode |
None |
Query mode of the Hudi table.
|
hoodie. hudisourcetablename.consume.start.timestamp |
None |
Start time of the incremental query on the Hudi table.
|
hoodie. hudisourcetablename.consume.max.commits |
None |
The incremental query on the Hudi table is based on the number of commits after hoodie.hudisourcetablename.consume.start.timestamp.
|
Querying a Hudi COW Table
For example, the name of a Hudi source table of the COW type is hudicow, and the name of the mapped Hive external table is hudicow.
- Real-time view query on the COW table:
- Incremental query on the COW table: Set three incremental query parameters based on the name of the Hudi source table. The where clause of the incremental query statements must contain `_hoodie_commit_time`>'xxx', where xxx indicates the value of hoodie.hudisourcetablename.consume.start.timestamp.
set hoodie.hudicow.consume.mode= INCREMENTAL;
set hoodie.hudicow.consume.max.commits=3;
set hoodie.hudicow.consume.start.timestamp= 20200427114546;
select count(*) from hudicow where `_hoodie_commit_time`>'20200427114546';
Querying a Hudi MOR Table
For example, the name of a Hudi source table of the MOR type is hudimor, and the two mapped Hive external tables are hudimor_ro (RO table) and hudimor_rt (RT table).
- Read-optimized view query on the RO table:
- Real-time view query on the RT table:
- Incremental query on the RT table: Set three incremental query parameters based on the name of the Hudi source table. The where clause of the incremental query statements must contain `_hoodie_commit_time`>'xxx', where xxx indicates the value of hoodie.hudisourcetablename.consume.start.timestamp.
set hoodie.hudimor.consume.mode=INCREMENTAL;
set hoodie.hudimor.consume.max.commits=-1;
set hoodie.hudimor.consume.start.timestamp=20210207144611;
select * from hudimor_rt where `_hoodie_commit_time`>'20210207144611';
set hoodie.hudisourcetablename.consume.mode=INCREMENTAL; is used only for the incremental query on the table. To switch to another query mode, run set hoodie.hudisourcetablename.consume.mode=SNAPSHOT;.
Querying Hive External Tables Corresponding to Hudi Schema Evolution Tables
If the Hudi table is a schema evolution table (some fields in the table have been modified), you need to set set hive.exec.schema.evolution to true when Hive queries the table.
The following uses the query of the real-time view of a COW table as an example. To query other views, you need to add this parameter.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot