On this page

Optimizing the Aggregate Algorithms

Updated on 2024-12-11 GMT+08:00

Scenario

Spark SQL supports hash aggregate algorithm. Namely, use fast aggregate hashmap as cache to improve aggregate performance. The hashmap replaces the previous ColumnarBatch to avoid performance problems caused by the wide mode (multiple key or value fields) of an aggregate table.

Procedure

If you want to enable optimization of aggregate algorithm, configure following parameters in the spark-defaults.conf file on the Spark client.

Table 1 Parameter description

Parameter

Description

Default Value

spark.sql.codegen.aggregate.map.twolevel.enabled

Specifies whether to enable aggregation algorithm optimization.

  • true: Enable
  • false: Disable

true

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback