Optimizing the Aggregate Algorithms
Scenario
Spark SQL supports hash aggregate algorithm. Namely, use fast aggregate hashmap as cache to improve aggregate performance. The hashmap replaces the previous ColumnarBatch to avoid performance problems caused by the wide mode (multiple key or value fields) of an aggregate table.
Procedure
If you want to enable optimization of aggregate algorithm, configure following parameters in the spark-defaults.conf file on the Spark client.
| Parameter | Description | Default Value |
|---|---|---|
| spark.sql.codegen.aggregate.map.twolevel.enabled | Specifies whether to enable aggregation algorithm optimization.
| true |
Last Article: Optimizing Small Files
Next Article: Optimizing Datasource Tables
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.