Configuring the Default Number of Data Blocks Divided by SparkSQL
Scenarios
By default, SparkSQL divides data into 200 data blocks during shuffle. In data-intensive scenarios, each data block may have excessive size. If a single data block of a task is larger than 2 GB, an error similar to the following will be reported while Spark attempts to fetch the data block:
Adjusted frame length exceeds 2147483647: 2717729270 - discarded
For example, setting the number of default data blocks to 200 causes SparkSQL to encounter an error in running a TPCDS 500-GB test. To avoid this, increase the number of default blocks in data-intensive scenarios.
Configuration parameters
Navigation path for setting parameters:
On Manager, choose All Configurations. Enter a parameter name in the search box.
and click
Parameter |
Description |
Default Value |
---|---|---|
spark.sql.shuffle.partitions |
Indicates the default number of blocks divided during shuffle. |
200 |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.