Optimizing Datasource Tables
Scenario
Save the partition information about the datasource table to the Metastore and process partition information in the Metastore.
- Optimize the datasource tables, support syntax such as adding, deletion, and modification in the table based on partitions, improving compatibility with Hive.
- Support statements of partition tailoring and push down to the Metastore to filter unmatched partitions.
You need only to process data corresponding to partCol=1 when performing the TableScan operation in the physical plan.
Procedure
Parameter | Description | Default Value |
|---|---|---|
spark.sql.hive.manageFilesourcePartitions | Specifies whether to enable Metastore partition management (including datasource tables and converted Hive).
| true |
spark.sql.hive.metastorePartitionPruning | Specifies whether to support pushing down predicate to Hive Metastore.
| true |
spark.sql.hive.filesourcePartitionFileCacheSize | The cache size of the partition file metadata in the memory. All tables share a cache that can use up to specified num bytes for file metadata. This parameter is valid only when spark.sql.hive.manageFilesourcePartitions is set to true. | 250 * 1024 * 1024 |
spark.sql.hive.convertMetastoreOrc | The processing approach of ORC tables.
| true |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.

