Configuring Spark Dynamic Resource Scheduling in YARN Mode

Scenarios

Resources are a key factor that affects Spark execution efficiency. Allocating multiple executors to a long-running service, such as the JDBCServer, without tasks can result in improper scheduling and wasted resources if there are insufficient resources available for other applications.

Dynamic resource scheduling can add or remove executors of applications in real time based on the task load. In this way, resources are dynamically scheduled to applications.

Procedure

Configure the external shuffle service.

The external shuffle service must be configured before you can the dynamic resource scheduling function.
Log in to FusionInsight Manager, choose Cluster > Services > Spark2x, click Configurations and then All Configurations. Enter the spark.dynamicAllocation.enabled parameter name in the search box and set it to true to enable dynamic resource scheduling.

Table 1 lists some optional configuration items.

**Table 1** Parameters for dynamic resource scheduling
Configuration Item	Description	Example Value
spark.dynamicAllocation.minExecutors	Minimum number of executors to run if dynamic allocation is enabled. When dynamic allocation is enabled, Spark automatically adjusts the number of executors based on workload demands, but the number of executors will not fall below this configured minimum. This ensures that a certain baseline level of compute resources is always available for the application.	0
spark.dynamicAllocation.initialExecutors	Initial number of executors to run if Spark's dynamic allocation is enabled.	0
spark.dynamicAllocation.maxExecutors	Maximum number of executors that Spark's dynamic allocation feature can allocate for a given application. When dynamic allocation is enabled, Spark automatically adjusts the number of executors based on workload demands, but the number of executors will not exceed this configured upper limit.	2048
spark.dynamicAllocation.schedulerBacklogTimeout	How long the scheduler waits for a backlog of pending tasks to accumulate before requesting new executors when Spark's dynamic allocation is enabled. If dynamic allocation is enabled and there have been pending tasks backlogged for more than this duration, Spark will request additional executors to accelerate processing.	1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout	Time interval after which Spark will continue to request additional executors if there remains a sustained backlog of pending tasks in the scheduler. If pending tasks remain after the initial executor request, Spark will continue to request additional executors at intervals defined by this parameter until the configured maximum number of executors is reached.	1s
spark.dynamicAllocation.executorIdleTimeout	Maximum amount of time an executor can remain idle before it is removed. If dynamic allocation is enabled and an executor has been idle for more than this duration, Spark will release the executor to reclaim cluster resources, avoiding resource waste.	60s
spark.dynamicAllocation.cachedExecutorIdleTimeout	Maximum amount of time an executor that has cached data blocks can remain idle before it is removed. Unlike regular executors, those holding cached data blocks can remain idle for a longer duration without being removed. This behavior helps preserve cached data, reducing the overhead of re-computation.	JDBCServer: 2147483647s IndexServer: 2147483647s SparkResource: 120