Broaden Support for Hive Partition Pruning Predicate Pushdown

Scenario

Partition pruning is an optimization technology. It reduces the amount of data to be scanned and improves query performance by scanning only the partitions that meet the query conditions during query execution, instead of scanning all partitions of the entire table.

In earlier versions, only the comparison expressions between column names and integers or character strings can be pushed down. In version 2.3, the pushdown of null, in, and, and or expressions is supported.

Configuring Parameters

Choose Cluster > Services > Spark2x or Spark > Configurations, click All Configurations, and search for the following parameters and adjust their values:

Parameter	Description	Example Value
spark.sql.hive.advancedPartitionPredicatePushdown.enabled	Specifies whether to broaden the support for Hive partition pruning predicate pushdown. true: Enable the enhanced partition predicate pushdown function for Hive tables. Spark attempts push more filter criteria (predicates) to partition pruning. false: Disable the enhanced partition predicate pushdown function for Hive tables. Enabling the enhanced partition predicate pushdown function can significantly improve the query performance of partitioned tables. However, exercise caution when enabling this function because it may cause compatibility issues.	true

Parameter

Description

Example Value

spark.sql.hive.advancedPartitionPredicatePushdown.enabled

Specifies whether to broaden the support for Hive partition pruning predicate pushdown.

true: Enable the enhanced partition predicate pushdown function for Hive tables. Spark attempts push more filter criteria (predicates) to partition pruning.
false: Disable the enhanced partition predicate pushdown function for Hive tables.

Enabling the enhanced partition predicate pushdown function can significantly improve the query performance of partitioned tables. However, exercise caution when enabling this function because it may cause compatibility issues.

true

After the parameter settings are modified, click Save, perform operations as prompted, and wait until the settings are saved successfully.
After the Spark server configurations are updated, if Configure Status is Expired, restart the component for the configurations to take effect.

Figure 1 Modifying Spark configurations

On the Spark dashboard page, choose More > Restart Service or Service Rolling Restart, enter the administrator password, and wait until the service restarts.

If you use the Spark client to submit tasks, you need to download the client again for the modification of the spark.sql.hive.advancedPartitionPredicatePushdown.enabled parameter to take effect. For details, see Using an MRS Client.

Components are unavailable during the restart, affecting upper-layer services in the cluster. To minimize the impact, perform this operation during off-peak hours or after confirming that the operation does not have adverse impact.