How Do I Eliminate Data Skew by Configuring AE Parameters?
Scenario
If the execution of an SQL statement takes a long time, you need to access the Spark UI to check the execution status.
If data skew occurs, the running time of a stage exceeds 20 minutes and only one task is running.
![Click to enlarge](https://support.huaweicloud.com/eu/dli_faq/en-us_image_0000001200929158.png)
Procedure
- Log in to the DLI management console. Choose Job Management > SQL Jobs in the navigation pane. On the displayed page, locate the job you want to modify and click Edit in the Operation column to switch to the SQL Editor page.
- On the SQL editor page, click Set Property and add the following Spark parameters through the Settings pane:
The string followed by the colons (:) are the configuration parameters, and the strings following the colons are the values.
spark.sql.enableToString:false spark.sql.adaptive.join.enabled:true spark.sql.adaptive.enabled:true spark.sql.adaptive.skewedJoin.enabled:true spark.sql.adaptive.enableToString:false spark.sql.adaptive.skewedPartitionMaxSplits:10
spark.sql.adaptive.skewedPartitionMaxSplits indicates the maximum number of tasks for processing a skewed partition. The default value is 5, and the maximum value is 10. This parameter is optional.
- Click Execute to run the job again.
O&M Guide FAQs
- How Do I Troubleshoot Slow SQL Jobs?
- How Do I View DLI SQL Logs?
- How Do I View SQL Execution Records?
- How Do I Eliminate Data Skew by Configuring AE Parameters?
- What Can I Do If a Table Cannot Be Queried on the DLI Console?
- The Compression Ratio of OBS Tables Is Too High
- How Can I Avoid Garbled Characters Caused by Inconsistent Character Codes?
- Do I Need to Grant Table Permissions to a User and Project After I Delete a Table and Create One with the Same Name?
- Why Can't I Query Table Data After Data Is Imported to a DLI Partitioned Table Because the File to Be Imported Does Not Contain Data in the Partitioning Column?
- How Do I Fix the Data Error Caused by CRLF Characters in a Field of the OBS File Used to Create an External OBS Table?
- Why Does a SQL Job That Has Join Operations Stay in the Running State?
- The on Clause Is Not Added When Tables Are Joined. Cartesian Product Query Causes High Resource Usage of the Queue, and the Job Fails to Be Executed
- Why Can't I Query Data After I Manually Add Data to the Partition Directory of an OBS Table?
- Why Is All Data Overwritten When insert overwrite Is Used to Overwrite Partitioned Table?
- Why Is a SQL Job Stuck in the Submitting State?
- Why Is the create_date Field in the RDS Table Is a Timestamp in the DLI query result?
- What Can I Do If datasize Cannot Be Changed After the Table Name Is Changed in a Finished SQL Job?
- Why Is the Data Volume Changes When Data Is Imported from DLI to OBS?
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.
more