Updated on 2022-12-14 GMT+08:00

Read

The read operation of Hudi applies to three views of Hudi. You can select a proper view for query based on requirements.

Hudi supports multiple query engines, including Spark, Hive, and HetuEngine. For details, see Table 1 and Table 2.

Table 1 COW tables

Query Engine

Real-time View/Read-optimized View

Incremental View

Hive

Y

Y

Spark (SparkSQL)

Y

Y

Spark (SparkDataSource API)

Y

Y

HetuEngine

Y

N

Table 2 MOR tables

Query Engine

Real-time View

Incremental View

Read-optimized View

Hive

Y

Y

Y

Spark (SparkSQL)

Y

Y

Y

Spark (SparkDataSource API)

Y

Y

Y

HetuEngine

Y

N

Y

  • Currently, the partition deduction capability is not supported when Hudi uses the Spark DataSource API to read data. For example, when the DataSource API is used to query a bootstrap table, the partition field may not be displayed or may be displayed as null.
  • For an incremental view, set hoodie.hudicow.consume.mode to INCREMENTAL. This parameter applies only to queries on the incremental view and cannot be used for queries on other types of Hudi tables or queries on other tables. You can set hoodie.hudicow.consume.mode to SNAPSHOT or any value to restore the configuration.