Updated on 2022-09-14 GMT+08:00

Configuring Dynamic Resource Scheduling in Yarn Mode

Scenario

Resources are a key factor that affects Spark execution efficiency. If multiple executors are allocated to a long-term service (for example, JDBCServer) that has no task but resources of other applications are insufficient, these resources are wasted and improperly scheduled.

Dynamic resource scheduling can add or remove executors of applications in real time based on the task load. In this way, resources are dynamically scheduled to applications.

Procedure

  1. You need to configure the external shuffle service first. For details, see Using the External Shuffle Service to Improve Performance.
  2. In the spark-defaults.conf file, add the spark.dynamicAllocation.enabled configuration item and set its value to true to enable dynamic resource scheduling. This function is disabled by default.
  3. Table 1 lists some optional configuration items.
    Table 1 Parameters for dynamic resource scheduling

    Configuration Item

    Description

    Default Value

    spark.dynamicAllocation.minExecutors

    Minimum number of executors

    0

    spark.dynamicAllocation.initialExecutors

    Initial number of executors

    spark.dynamicAllocation.minExecutors

    spark.dynamicAllocation.maxExecutors

    Maximum number of executors

    Integer.MAX_VALUE

    spark.dynamicAllocation.schedulerBacklogTimeout

    First timeout interval for scheduling

    1(s)

    spark.dynamicAllocation.sustainedSchedulerBacklogTimeout

    Second and later timeout interval for scheduling

    spark.dynamicAllocation.schedulerBacklogTimeout

    spark.dynamicAllocation.executorIdleTimeout

    Idle timeout interval for common executors

    60(s)

    spark.dynamicAllocation.cachedExecutorIdleTimeout

    Idle timeout interval for executors with cached blocks

    Integer.MAX_VALUE

    • The external shuffle service must be configured before dynamic resource scheduling is enabled. If the external shuffle service is not configured, shuffle files are lost when an executor is killed.
    • If spark.executor.instances or --num-executors specifies the number of Executor, the dynamic resource allocation will not take effect even if it is enabled.
    • After dynamic resource scheduling is enabled, a task may be allocated to an executor to be removed, resulting in a task failure. After the same task fails for four times (can be configured by the spark.task.maxFailures parameter), the job fails. In practice, it is unlikely that a task is allocated to executors to be removed. In addition, the probability of job failure can be reduced by increasing the value of spark.task.maxFailures.