Using the external shuffle service to improve performance
Scenario
When the Spark system runs applications that contain a shuffle process, an executor process also writes shuffle data and provides shuffle data for other executors in addition to running tasks. If the executor is heavily loaded and GC is triggered, the executor cannot provide shuffle data for other executors, affecting task running.
The external shuffle service is an auxiliary service in NodeManager. It captures shuffle data to reduce the load on executors. If GC occurs on an executor, tasks on other executors are not affected.
Procedure
- Log in to FusionInsight Manager.
- Choose Cluster > Services > Spark and click Configurations then All Configurations.
- Click SparkResource, select Default, and modify the following parameter:
Table 1 Parameter Parameter
Default Value
Changed To
spark.shuffle.service.enabled
false
true
- Restart the Spark service for the configuration to take effect.
To use External Shuffle Service on the Spark client, you need to download and install the Spark client again.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot