Help Center/ Data Lake Insight/ FAQs/ Spark Jobs/ Spark Job Development/ How Do I Use JDBC to Set the spark.sql.shuffle.partitions Parameter to Improve the Task Concurrency?

Updated on 2024-11-15 GMT+08:00

View PDF

How Do I Use JDBC to Set the spark.sql.shuffle.partitions Parameter to Improve the Task Concurrency?

Scenario

When shuffle statements, such as GROUP BY and JOIN, are executed in Spark jobs, data skew occurs, which slows down the job execution.

To solve this problem, you can configure spark.sql.shuffle.partitions to improve the concurrency of shuffle read tasks.

Configuring spark.sql.shuffle.partitions

You can use the set clause to configure the dli.sql.shuffle.partitions parameter in JDBC. The statement is as follows:

Statement st = conn.stamte()
st.execute("set spark.sql.shuffle.partitions=20")

Parent topic: Spark Job Development

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot