Updated on 2024-12-11 GMT+08:00

Read Hudi Data

Read operations on Hudi tables are based on three types of views. You can select a proper view for query as required.

Hudi supports multiple query engines, including Spark and Hive. For details, see Table 1 and Table 2.

Table 1 COW tables

Query Engine

Real-time View/Read-optimized View

Incremental View

Hive

Y

Y

Spark (SparkSQL)

Y

Y

Spark (SparkDataSource API)

Y

Y

Table 2 MOR tables

Query Engine

Real-time View

Incremental View

Read-optimized View

Hive

Y

Y

Y

Spark (SparkSQL)

Y

Y

Y

Spark (SparkDataSource API)

Y

Y

Y

  • Currently, the partition deduction capability is not supported when Hudi uses the Spark DataSource API to read data. For example, when the DataSource API is used to query a bootstrap table, the partition field may not be displayed or may be displayed as null.
  • For an incremental view, set hoodie.hudicow.consume.mode to INCREMENTAL. This parameter applies only to queries on the incremental view and cannot be used for queries on other types of Hudi tables or queries on other tables. You can set hoodie.hudicow.consume.mode to SNAPSHOT or any value to restore the configuration.