Read Hudi Data
Read operations on Hudi tables are based on three types of views. You can select a proper view for query as required.
Hudi supports multiple query engines, including Spark and Hive. For details, see Table 1 and Table 2.
Query Engine |
Real-time View/Read-optimized View |
Incremental View |
---|---|---|
Hive |
Y |
Y |
Spark (SparkSQL) |
Y |
Y |
Spark (SparkDataSource API) |
Y |
Y |
Query Engine |
Real-time View |
Incremental View |
Read-optimized View |
---|---|---|---|
Hive |
Y |
Y |
Y |
Spark (SparkSQL) |
Y |
Y |
Y |
Spark (SparkDataSource API) |
Y |
Y |
Y |
- Currently, the partition deduction capability is not supported when Hudi uses the Spark DataSource API to read data. For example, when the DataSource API is used to query a bootstrap table, the partition field may not be displayed or may be displayed as null.
- For an incremental view, set hoodie.hudicow.consume.mode to INCREMENTAL. This parameter applies only to queries on the incremental view and cannot be used for queries on other types of Hudi tables or queries on other tables. You can set hoodie.hudicow.consume.mode to SNAPSHOT or any value to restore the configuration.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.