Broaden Support for Hive Partition Pruning Predicate Pushdown
Scenario
Partition pruning is an optimization technology. It reduces the amount of data to be scanned and improves query performance by scanning only the partitions that meet the query conditions during query execution, instead of scanning all partitions of the entire table.
In earlier versions, only the comparison expressions between column names and integers or character strings can be pushed down. In version 2.3, the pushdown of null, in, and, and or expressions is supported.
Configuring Parameters
- Log in to FusionInsight Manager.
For details, see Accessing FusionInsight Manager.
- Choose Cluster > Services > Spark2x or Spark > Configurations, click All Configurations, and search for the following parameters and adjust their values:
Parameter
Description
Example Value
spark.sql.hive.advancedPartitionPredicatePushdown.enabled
Specifies whether to broaden the support for Hive partition pruning predicate pushdown.
- true: Enable the enhanced partition predicate pushdown function for Hive tables. Spark attempts push more filter criteria (predicates) to partition pruning.
- false: Disable the enhanced partition predicate pushdown function for Hive tables.
Enabling the enhanced partition predicate pushdown function can significantly improve the query performance of partitioned tables. However, exercise caution when enabling this function because it may cause compatibility issues.
true
- After the parameter settings are modified, click Save, perform operations as prompted, and wait until the settings are saved successfully.
- After the Spark server configurations are updated, if Configure Status is Expired, restart the component for the configurations to take effect.
Figure 1 Modifying Spark configurations
On the Spark dashboard page, choose More > Restart Service or Service Rolling Restart, enter the administrator password, and wait until the service restarts.
If you use the Spark client to submit tasks, you need to download the client again for the modification of the spark.sql.hive.advancedPartitionPredicatePushdown.enabled parameter to take effect. For details, see Using an MRS Client.Components are unavailable during the restart, affecting upper-layer services in the cluster. To minimize the impact, perform this operation during off-peak hours or after confirming that the operation does not have adverse impact.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot