Updated on 2024-10-09 GMT+08:00

Configuring Spark Dynamic Resource Scheduling in YARN Mode

Scenario

Resources are a key factor that affects Spark execution efficiency. Allocating multiple executors to a long-running service, such as the JDBCServer, without tasks can result in improper scheduling and wasted resources if there are insufficient resources available for other applications.

Dynamic resource scheduling can add or remove executors of applications in real time based on the task load. In this way, resources are dynamically scheduled to applications.

Procedure

  1. Configure the external shuffle service.
  2. Log in to FusionInsight Manager, choose Cluster > Services > Spark2x, click Configurations, and click All Configurations. Enter the spark.dynamicAllocation.enabled parameter name in the search box and set it to true to enable dynamic resource scheduling.
Table 1 lists some optional configuration items.
Table 1 Parameters for dynamic resource scheduling

Configuration Item

Description

Default Value

spark.dynamicAllocation.minExecutors

Indicates the minimum number of executors.

0

spark.dynamicAllocation.initialExecutors

Indicates the number of initial executors.

0

spark.dynamicAllocation.maxExecutors

Indicates the maximum number of executors.

2048

spark.dynamicAllocation.schedulerBacklogTimeout

Indicates the first timeout period for scheduling.

1s

spark.dynamicAllocation.sustainedSchedulerBacklogTimeout

Indicates the second and later timeout interval for scheduling.

1s

spark.dynamicAllocation.executorIdleTimeout

Indicates the idle timeout interval for common executors.

60s

spark.dynamicAllocation.cachedExecutorIdleTimeout

Indicates the idle timeout interval for executors with cached blocks.

  • JDBCServer2x: 2147483647s
  • IndexServer2x: 2147483647s
  • SparkResource2x: 120

The external shuffle service must be configured before using the dynamic resource scheduling function.