Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using Spark/Spark2x/ Spark SQL Performance Tuning/ Optimizing the Aggregate Algorithms

Updated on 2025-08-22 GMT+08:00

View PDF

Optimizing the Aggregate Algorithms

Scenarios

Spark SQL supports hash aggregate algorithm. Namely, use fast aggregate hashmap as cache to improve aggregate performance. The hashmap replaces the previous ColumnarBatch to avoid performance problems caused by the wide mode (multiple key or value fields) of an aggregate table.

Procedure

Install the Spark client.

For details, see Installing a Client.

Log in to the Spark client node as the client installation user.

Modify the following parameters in the {Client installation directory}/Spark/spark/conf/spark-defaults.conf file on the Spark client.

**Table 1** Parameter description
Parameter	Description	Example Value
spark.sql.codegen.aggregate.map.twolevel.enabled	Specifies whether to enable aggregation algorithm optimization. true: Enable false: Disable	true

Parent topic: Spark SQL Performance Tuning

Previous topic: Optimizing Small Files

Next topic: Optimizing Datasource Tables

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.