Resource Topology

In Slurm, the topology.conf file defines the physical topology of a cluster (such as the rack or switch layer) to optimize job scheduling policies (such as by preferentially allocating adjacent nodes).

The following is a detailed description and example of the configuration file.

Basic Syntax

SwitchName=<layer-name>[Nodes=<node-list>][Switches=<sub-layer-list>][Children=<number-of-child-nodes>]

SwitchName: defines the name of a topology layer (for example, rack or switch).
Nodes: lists the nodes (for example, node[1-10]) at the topology layer.
Switches: indicates the sub-layers (used for the nested structure) contained in the topology layer.
Children: indicates the number of sub-layers. This field is optional.

Configuration Examples

Single-layer topology (racks and nodes)

# Define two racks (rack1 and rack2). Each rack contains 10 nodes.
SwitchName=rack1 Nodes=node[1-10]
SwitchName=rack2 Nodes=node[11-20]

Multi-layer topology (cluster, racks, switches, and nodes)

# Cluster (cluster1 is deployed across two equipment rooms.)
SwitchName=cluster1 Switches=room1,room2
 
# Equipment room (There are two racks in room1.)
SwitchName=room1 Switches=rack1,rack2
 
# Rack (There are two switches on rack1.)
SwitchName=rack1 Switches=switch1,switch2
 
# Switch (switch1 connects to 10 nodes.)
SwitchName=switch1 Nodes=node[1-10]
SwitchName=switch2 Nodes=node[11-20]

Hybrid layer (switches and nodes)

# The rack houses switches and nodes.
SwitchName=rack1 Nodes=node1,node2 Switches=switch1
SwitchName=switch1 Nodes=node3,node4

Key Parameters

Parameter	Description
Nodes	List of continuous or discrete nodes in the format of node[1-5,10].
Switches	List of sub-layer names (used to define the nested structure, such as a rack containing switches).
Children	Number of sub-layers. For example, Children=2 indicates two sub-layers. This parameter is usually used together with Switches.

Verification

Check the topology structure.

scontrol show topology

Example output:

SwitchName=rack1 Nodes=node[1-10]
SwitchName=rack2 Nodes=node[11-20]

View the topology associated with a node.
```
scontrol show node node01
```
The SwitchName field in the output indicates the topology layer.
```
NodeName=node01 ... SwitchName=rack1
```

Submit a test job.

# Allocate two nodes on the same rack.
sbatch --nodes=2 --switches=1@rack1 --wrap="hostname"
squeue -o "%N" # Check whether the allocated nodes are on rack1.

FAQ

A node is not associated with any layer.
- Symptom: SwitchName in the scontrol show node command output is empty.
- Solution: Check whether the Nodes parameter contains the node.
There are syntax errors.
- Symptom: slurmctld fails with an error message "invalid topology.conf" displayed in the log.
- Solution: Run the scontrol reconfigure command to reload the configuration and then view the log.
```
tail -f /var/log/slurm/slurmctld.log
```
Failed to allocate nodes for a job.
- Reason: The value of the --switches parameter exceeds the capacity of the actual layer.
- Example: If rack1 has only 10 nodes, submitting --nodes=12 --switches=1@rack1 will fail.

Best Practices

Align with the actual hardware.
Ensure that the hierarchy (such as racks and switches) in topology.conf is consistent with the physical environment configuration.
Reduce the number of layers.
Do not nest too many layers (for example, clusters, equipment rooms, racks, switches, and nodes). Generally, two or three layers are enough.
Limit the use of topology resources based on QoS.
Use QoS to limit the resources that can be used by users at a specific topology layer.
```
sacctmgr add qos high_priority --max-switches-per-job=2
```

Configuration Example (Dynamic Resource Partitioning)

# Define the independent topology of GPU nodes.
SwitchName=gpu_rack Nodes=gpu[1-4] # A dedicated rack for GPU nodes
# Specify the GPU topology when submitting a job.
sbatch --gres=gpu:2 --switches=1@gpu_rack job.sh

Parent topic: HPC Management and Scheduling Plug-in

Previous topic: Partition Management

Next topic: Job Queues