Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using Spark/Spark2x/ Spark Core Performance Tuning/ Using the External Shuffle Service to Improve Spark Core Performance
Updated on 2025-08-22 GMT+08:00

Using the External Shuffle Service to Improve Spark Core Performance

Scenarios

When the Spark system runs applications that contain a shuffle process, an executor process also writes shuffle data and provides shuffle data for other executors in addition to running tasks. If the executor is heavily loaded and GC is triggered, the executor cannot provide shuffle data for other executors, affecting task running.

The external shuffle service is an auxiliary service in NodeManager. It captures shuffle data to reduce the load on executors. If GC occurs on an executor, tasks on other executors are not affected.

Procedure

  1. Log in to FusionInsight Manager.

    For details, see Accessing FusionInsight Manager.

  2. Choose Cluster > Services > Spark2x and click Configurations. Select All Configurations.
  3. Choose SparkResource2x > Default and modify the following parameters.

    Table 1 Parameter list

    Parameter

    Description

    Example Value

    spark.shuffle.service.enabled

    Whether the external shuffle service is enabled. Enabling this parameter allows Spark to use an external shuffle service, which is a long-running process that runs on the NodeManager process, for handling shuffle data. This setting helps improve shuffle performance.

    • true: The external shuffle service is enabled.
    • false: The external shuffle service is disabled.

    true

  4. After the parameter settings are modified, click Save, perform operations as prompted, and wait until the settings are saved successfully.
  5. After the Spark server configurations are updated, if Configure Status is Expired, restart the component for the configurations to take effect.

    Figure 1 Modifying Spark configurations

    On the Spark dashboard page, choose More > Restart Service or Service Rolling Restart, enter the administrator password, and wait until the service restarts.

    To use the external shuffle service on the Spark client, you need to download and install the Spark client again. For details, see Using an MRS Client.

    Components are unavailable during the restart, affecting upper-layer services in the cluster. To minimize the impact, perform this operation during off-peak hours or after confirming that the operation does not have adverse impact.