Optimizing Datasource Tables
Scenarios
Save the partition information about the datasource table to the Metastore and process partition information in the Metastore.
- Optimize the datasource tables, support syntax such as adding, deletion, and modification in the table based on partitions, improving compatibility with Hive.
- Support statements of partition tailoring and push down to the Metastore to filter unmatched partitions.
You need only to process data corresponding to partCol=1 when performing the TableScan operation in the physical plan.
Procedure
- Install the Spark client.
- Log in to the Spark client node as the client installation user.
Modify the following parameters in the {Client installation directory}/Spark/spark/conf/spark-defaults.conf file on the Spark client.
Table 1 Parameter description Parameter
Description
Example Value
spark.sql.hive.manageFilesourcePartitions
Specifies whether to enable Metastore partition management (including datasource tables and converted Hive).
- true indicates enabling Metastore partition management. In this case, datasource tables are stored in Hive and Metastore is used to tailor partitions in query statements.
- false indicates disabling Metastore partition management.
true
spark.sql.hive.metastorePartitionPruning
Specifies whether to support pushing down predicate to Hive Metastore.
- true indicates supporting pushing down predicate to Hive Metastore. Only the predicate of Hive tables is supported.
- false indicates not supporting pushing down predicate to Hive Metastore.
true
spark.sql.hive.filesourcePartitionFileCacheSize
The cache size of the partition file metadata in the memory.
All tables share a cache that can use up to specified num bytes for file metadata.
This parameter is valid only when spark.sql.hive.manageFilesourcePartitions is set to true.
250 * 1024 * 1024
spark.sql.hive.convertMetastoreOrc
The processing approach of ORC tables.
- false: Spark SQL uses Hive SerDe to process ORC tables.
- true: Spark SQL uses the Spark built-in mechanism to process ORC tables.
true
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.