Configuring Hive to Read a Hudi Table

Hive External Tables Corresponding to Hudi Tables

A Hudi source table corresponds to a copy of HDFS data. The Hudi table data can be mapped to a Hive external table through the Spark component, Flink component, or Hudi client. Based on the external table, Hive can perform real-time view query, read-optimized view query, and incremental view query.

Different view queries are provided for different types of Hudi source tables:

When the Hudi source table is a Copy-On-Write (COW) table, it can be mapped to a Hive external table. The table supports real-time query and incremental view query.
When the Hudi source table is a Merge-On-Read (MOR) table, it can be mapped to two Hive external tables (RO table and RT table). The RO table supports read-optimized view query, and the RT table supports real-time view query and incremental view query.

Notes and Constraints

You cannot perform insert, update, delete, load, merge, alter, or msck operations on the Hive external table. Only the select operation is supported.
Granting Table Permissions
You cannot perform update, alter, write, or all operations.
Backup and Restoration
You only need to back up one of the RT and RO tables if necessary. This is because the RO and RT tables are mapped by the same Hudi source table. If one table is backed up, the other table is also backed up.
Component Version
- Hive: FusionInsight_HD_xxx; Hive kernel version 3.1.0.
- Spark2x: FusionInsight_Spark2x_xxx; Hudi kernel version 0.11.0.

Creating Hive External Tables Corresponding to Hudi Tables

You can create a Hudi table in spark-shell or spark-sql and insert data into the table.

Generally, Hudi table data is synchronized to Hive external tables when the data is imported to the lake. In this case, you can directly query the corresponding Hive external tables in Beeline. If the data is not synchronized to the Hive external tables, you can manually synchronize the data using the Hudi client tool. For details, see Synchronizing Hudi Table Data to Hive.

Querying Hive External Tables Corresponding to Hudi Tables

Before performing incremental query on Hudi tables using Hive, you need to set three table-level parameters in Table 1. Each Hudi source table corresponds to three parameters. hudisourcetablename indicates the name of the Hudi source table (not the name of the Hive external table). You can change the name as required.

**Table 1** Parameter description
Parameter	Description	Default Value
hoodie.hudisourcetablename.consume.mode	Query mode of the Hudi table. Incremental query: Set it to INCREMENTAL. Non-incremental query: Do not set this parameter or set it to SNAPSHOT.	None
hoodie.hudisourcetablename.consume.start.timestamp	Start time of the incremental query on the Hudi table. Incremental query: start time of the incremental query. Non-incremental query: Do not set this parameter.	None
hoodie.hudisourcetablename.consume.max.commits	The incremental query on the Hudi table is based on the number of commits after hoodie.hudisourcetablename.consume.start.timestamp. Incremental query: number of commits. For example, if this parameter is set to 3, data after three commits from the specified start time is queried. If this parameter is set to -1, all data committed after the specified start time is queried. Non-incremental query: Do not set this parameter.	None

To query a Hudi table, perform the following steps:

Log in to the node where the client is installed as the Hive client installation user.
Go to the client installation directory, configure environment variables, and authenticate the user.

Go to the client installation directory.
```
cd Client installation directory
```
Load the environment variables.
```
source bigdata_env
```
Authenticate the user. If Kerberos authentication is disabled for the cluster (in normal mode), skip this step.
```
kinit Hive service user
```
Log in to the Hive client.
```
beeline
```
Query the Hudi table.
- Querying a Hudi COW Table
  For example, if the name of a Hudi COW source table is hudicow, the name of the mapped Hive external table is hudicow.
  - Real-time view query on the COW table:
```
select * from hudicow;
```
  - Incremental query on the COW table: Set three incremental query parameters based on the name of the Hudi source table. The where clause of the incremental query statements must contain `_hoodie_commit_time`>'xxx', where xxx indicates the value of hoodie.hudisourcetablename.consume.start.timestamp.
    Set the query mode of the Hudi table to INCREMENTAL.
```
set hoodie.hudicow.consume.mode= INCREMENTAL;
```
    Set the number of commits for incremental queries on the Hudi table.
```
set hoodie.hudicow.consume.max.commits=3;
```
    Set the start time of incremental query on the Hudi table.
```
set hoodie.hudicow.consume.start.timestamp= 20200427114546;
```
    Perform an incremental query on the COW table.
```
select count(*) from hudicow where `_hoodie_commit_time`>'20200427114546';
```
- Querying a Hudi MOR Table
  For example, if the name of the Hudi MOR source table is hudimor, the table is mapped to two Hive external tables hudimor_ro (RO table) and hudimor_rt (RT table).
  - Read-optimized view query on the RO table:
```
select * from hudicow_ro;
```
  - Real-time view query on the RT table:
```
select * from hudicow_rt;
```
  - Incremental query on the RT table: Set three incremental query parameters based on the name of the Hudi source table. The where clause of the incremental query statements must contain `_hoodie_commit_time`>'xxx', where xxx indicates the value of hoodie.hudisourcetablename.consume.start.timestamp.
    Set the query mode of the Hudi table.
    - Set incremental query for the Hudi table.
```
set hoodie.hudimor.consume.mode=INCREMENTAL;
```
    - Setting other query modes for the Hudi table.
```
set hoodie.hudisourcetablename.consume.mode=SNAPSHOT;
```
    Set the number of commits for incremental queries on the Hudi table.
```
set hoodie.hudimor.consume.max.commits=-1;
```
    Set the start time of incremental query on the Hudi table.
```
set hoodie.hudimor.consume.start.timestamp=20210207144611;
```
    Query the RT table.
```
select * from hudimor_rt where `_hoodie_commit_time`>'20210207144611';
```

Querying Hive External Tables Corresponding to Hudi Schema Evolution Tables

If the Hudi table is a schema evolution table (the table fields have been modified), you need to set the following parameter when querying the Hudi table on the Hive client.

set hive.exec.schema.evolution=true;

If this parameter is set to true, Hive automatically adapts to schema changes of Hudi tables. This ensures to provide accurate and consistent query results based on the latest table structure.

The following illustrates the parameters for querying views. Take the real-time view of the COW table as an example.

Set hive.exec.schema.evolution to true.

set hive.exec.schema.evolution=true;

Query the Hive external table.

select * from hudicow;

Parent topic: Using Hive

Previous topic: Configuring Hive to Read Data from a Relational Database

Next topic: Enterprise-Class Enhancements of Hive

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel