Configuring Dynamic Resource Scheduling in Yarn Mode

Scenario

Resources are a key factor that affects Spark execution efficiency. If multiple executors are allocated to a long-term service (for example, JDBCServer) that has no task but resources of other applications are insufficient, these resources are wasted and improperly scheduled.

Dynamic resource scheduling can add or remove executors of applications in real time based on the task load. In this way, resources are dynamically scheduled to applications.

Procedure

You need to configure the external shuffle service first. For details, see Using the External Shuffle Service to Improve Performance.
In the spark-defaults.conf file, add the spark.dynamicAllocation.enabled configuration item and set its value to true to enable dynamic resource scheduling. This function is disabled by default.

Table 1 lists some optional configuration items.

**Table 1** Parameters for dynamic resource scheduling
Configuration Item	Description	Default Value
spark.dynamicAllocation.minExecutors	Minimum number of executors	0
spark.dynamicAllocation.initialExecutors	Initial number of executors	spark.dynamicAllocation.minExecutors
spark.dynamicAllocation.maxExecutors	Maximum number of executors	Integer.MAX_VALUE
spark.dynamicAllocation.schedulerBacklogTimeout	First timeout interval for scheduling	1(s)
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout	Second and later timeout interval for scheduling	spark.dynamicAllocation.schedulerBacklogTimeout
spark.dynamicAllocation.executorIdleTimeout	Idle timeout interval for common executors	60(s)
spark.dynamicAllocation.cachedExecutorIdleTimeout	Idle timeout interval for executors with cached blocks	Integer.MAX_VALUE

The external shuffle service must be configured before dynamic resource scheduling is enabled. If the external shuffle service is not configured, shuffle files are lost when an executor is killed.
If spark.executor.instances or --num-executors specifies the number of Executor, the dynamic resource allocation will not take effect even if it is enabled.
After dynamic resource scheduling is enabled, a task may be allocated to an executor to be removed, resulting in a task failure. After the same task fails for four times (can be configured by the spark.task.maxFailures parameter), the job fails. In practice, it is unlikely that a task is allocated to executors to be removed. In addition, the probability of job failure can be reduced by increasing the value of spark.task.maxFailures.