Configuring the Default Number of Data Blocks Divided by SparkSQL
Scenarios
By default, SparkSQL divides data into 200 data blocks during shuffle. In data-intensive scenarios, each data block may have excessive size. If a single data block of a task is larger than 2 GB, an error similar to the following will be reported while Spark attempts to fetch the data block:
Adjusted frame length exceeds 2147483647: 2717729270 - discarded
For example, setting the number of default data blocks to 200 causes SparkSQL to encounter an error in running a TPCDS 500-GB test. To avoid this, increase the number of default blocks in data-intensive scenarios.
Configuration parameters
- Log in to FusionInsight Manager.
For details, see Accessing FusionInsight Manager.
- Choose Cluster > Services > Spark2x, and click Configurations and then All Configurations. Enter a parameter name in the search box.
Table 1 Parameter description Parameter
Description
Example Value
spark.sql.shuffle.partitions
Indicates the default number of blocks divided during shuffle.
200
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot