Configuring the Default Number of Data Blocks Divided by SparkSQL
Scenario
By default, SparkSQL divides data into 200 data blocks during shuffle. In data-intensive scenarios, each data block may have excessive size. If a single data block of a task is larger than 2 GB, an error similar to the following will be reported while Spark attempts to fetch the data block:
Adjusted frame length exceeds 2147483647: 2717729270 - discarded
For example, setting the number of default data blocks to 200 causes SparkSQL to encounter an error in running a TPCDS 500-GB test. To avoid this, increase the number of default blocks in data-intensive scenarios.
Configuration parameters
Navigation path for setting parameters:
On Manager, choose Cluster > Services > Spark, click Configurations then All Configurations, and enter a parameter name in the search box.
Parameter |
Description |
Default Value |
---|---|---|
spark.sql.shuffle.partitions |
Indicates the default number of blocks divided during shuffle. |
200 |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot