Updated on 2025-05-21 GMT+08:00

Resource Topology

In Slurm, the topology.conf​ file defines the physical topology of a cluster (such as the rack or switch layer) to optimize job scheduling policies (such as by preferentially allocating adjacent nodes).

The following is a detailed description and example of the configuration file.

Basic Syntax

SwitchName=<layer-name>[Nodes=<node-list>][Switches=<sub-layer-list>][Children=<number-of-child-nodes>]
  • SwitchName​: defines the name of a topology layer (for example, rack or switch).
  • Nodes​: lists the nodes (for example, node[1-10]) at the topology layer.
  • Switches: indicates the sub-layers (used for the nested structure) contained in the topology layer.
  • Children​: indicates the number of sub-layers. This field is optional.

Configuration Examples

  • Single-layer topology (racks and nodes)
    # Define two racks (rack1 and rack2). Each rack contains 10 nodes.
    SwitchName=rack1 Nodes=node[1-10]
    SwitchName=rack2 Nodes=node[11-20]
  • Multi-layer topology (cluster, racks, switches, and nodes)​
    # Cluster (cluster1 is deployed across two equipment rooms.)
    SwitchName=cluster1 Switches=room1,room2
     
    # Equipment room (There are two racks in room1.)
    SwitchName=room1 Switches=rack1,rack2
     
    # Rack (There are two switches on rack1.)
    SwitchName=rack1 Switches=switch1,switch2
     
    # Switch (switch1 connects to 10 nodes.)
    SwitchName=switch1 Nodes=node[1-10]
    SwitchName=switch2 Nodes=node[11-20]
  • Hybrid layer (switches and nodes)
    # The rack houses switches and nodes.
    SwitchName=rack1 Nodes=node1,node2 Switches=switch1
    SwitchName=switch1 Nodes=node3,node4

Key Parameters

Parameter

Description

Nodes

List of continuous or discrete nodes in the format of node[1-5,10].

Switches

List of sub-layer names (used to define the nested structure, such as a rack containing switches).

Children

Number of sub-layers. For example, Children=2 indicates two sub-layers. This parameter is usually used together with Switches.

Verification

  1. Check the topology structure.
    scontrol show topology

    Example output:

    SwitchName=rack1 Nodes=node[1-10]
    SwitchName=rack2 Nodes=node[11-20]
  2. View the topology associated with a node.
    scontrol show node node01

    The SwitchName field in the output indicates the topology layer.

    NodeName=node01 ... SwitchName=rack1
  3. Submit a test job.
    # Allocate two nodes on the same rack.
    sbatch --nodes=2 --switches=1@rack1 --wrap="hostname"
    squeue -o "%N" # Check whether the allocated nodes are on rack1.

FAQ

  • A node is not associated with any layer.
    • Symptom: SwitchName in the scontrol show node command output is empty.
    • Solution: Check whether the Nodes parameter contains the node.
  • There are syntax errors.
    • Symptom: slurmctld fails with an error message "invalid topology.conf" displayed in the log.
    • Solution: Run the scontrol reconfigure command to reload the configuration and then view the log.
      tail -f /var/log/slurm/slurmctld.log
  • Failed to allocate nodes for a job.
    • Reason: The value of the --switches parameter exceeds the capacity of the actual layer.
    • Example: If rack1 has only 10 nodes, submitting --nodes=12 --switches=1@rack1 will fail.

Best Practices

  • Align with the actual hardware.

    Ensure that the hierarchy (such as racks and switches) in topology.conf is consistent with the physical environment configuration.

  • Reduce the number of layers.

    Do not nest too many layers (for example, clusters, equipment rooms, racks, switches, and nodes). Generally, two or three layers are enough.

  • Limit the use of topology resources based on QoS.
    Use QoS to limit the resources that can be used by users at a specific topology layer.
    sacctmgr add qos high_priority --max-switches-per-job=2

Configuration Example (Dynamic Resource Partitioning)

# Define the independent topology of GPU nodes.
SwitchName=gpu_rack Nodes=gpu[1-4] # A dedicated rack for GPU nodes
# Specify the GPU topology when submitting a job.
sbatch --gres=gpu:2 --switches=1@gpu_rack job.sh