Help Center/ Data Lake Insight/ FAQs/ Problems Related to Spark Jobs/ Job Development/ How Do I Use JDBC to Set the spark.sql.shuffle.partitions Parameter to Improve the Task Concurrency?

Updated on 2023-03-21 GMT+08:00

View PDF

How Do I Use JDBC to Set the spark.sql.shuffle.partitions Parameter to Improve the Task Concurrency?

Scenario

When shuffle statements, such as GROUP BY and JOIN, are executed in Spark jobs, data skew occurs, which slows down the job execution.

To solve this problem, you can configure spark.sql.shuffle.partitions to improve the concurrency of shuffle read tasks.

Configuring spark.sql.shuffle.partitions

You can use the set clause to configure the dli.sql.shuffle.partitions parameter in JDBC. The statement is as follows:

Statement st = conn.stamte()
st.execute("set spark.sql.shuffle.partitions=20")

Parent topic: Job Development

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel