Updated on 2024-08-16 GMT+08:00

Configuring Workload Affinity or Anti-affinity Scheduling

Kubernetes offers workload affinity and anti-affinity scheduling, which allows for flexible scheduling of new workloads on either related or unrelated nodes. This results in improved cluster performance and utilization.

For example, frontend pods and backend pods that frequently communicate with each other can be preferentially scheduled to the same node or AZ to minimize network latency. The process of workload affinity/anti-affinity is as follows:

  1. Nodes are categorized into different topology keys (topologyKey) using node labels.
  2. Identify the affinity/anti-affinity workloads based on the workload labels and operators.
  3. For affinity scheduling, the scheduler chooses the topology key where the target workload is located, while for anti-affinity scheduling, it selects a topology key where the target workload is not present.
Figure 1 Workload affinity or anti-affinity scheduling

Configuring Load Affinity/Anti-affinity on the Console

  1. When creating a workload, click Scheduling in the Advanced Settings area. For details about how to create a workload, see Creating a Workload.
  2. Select a load affinity scheduling policy.

    • Not configured: No load affinity policy is configured.
    • Multi-AZ deployment preferred: Workload pods are preferentially scheduled to nodes in different AZs through pod anti-affinity. The AZs serve as topology keys in this process.
    • Forcible multi-AZ deployment: Workload pods are forcibly scheduled to nodes in different AZs through pod anti-affinity. The AZs serve as topology keys in this process. When this scheduling policy is used, if there are fewer nodes than pods or node resources are insufficient, the extra pods will fail to run.
    • Custom policies: allow flexible scheduling of workload pods based on pod labels. For details about the supported scheduling policies, see Table 1. Select a proper policy type and click to add a policy. For details about the parameters, see Table 2.
      Table 1 Load affinity policies

      Policy

      Rule Type

      Description

      Workload affinity

      Required

      Hard constraint, which corresponds to requiredDuringSchedulingIgnoredDuringExecution in YAML for specifying the conditions that must be met.

      Select pods that require affinity by label. If such pods already run on a node in the topology key, the scheduler will forcibly schedule the created pods to that topology key.

      NOTE:

      If multiple affinity rules are configured, multiple labels will be used to filter pods that require affinity, and the newly created pods must be affinity with all pods that meet the label filtering conditions. In this way, all pods that meet the label filtering conditions locate in the same topology key for scheduling.

      Preferred

      Soft constraint, which corresponds to preferredDuringSchedulingIgnoredDuringExecution in YAML for specifying the conditions that need to be met as much as possible.

      Select pods that require affinity by label. If such pods already run on a node in the topology key, the scheduler will preferentially schedule the created pods to that topology key.

      NOTE:

      If multiple affinity rules are configured, multiple labels will be used to filter pods that require affinity, and the newly created pods will be preferentially to be affinity with multiple pods that meet the label filtering conditions. However, even if no pod meets the label filter conditions, a topology key will be selected for scheduling.

      Workload anti-affinity

      Required

      Hard constraint, which corresponds to requiredDuringSchedulingIgnoredDuringExecution in YAML for specifying the conditions that must be met.

      Select one or more pods that require anti-affinity by label. If such pods already run on a node in the topology key, the scheduler will not schedule the created pods to that topology key.

      NOTE:

      If multiple anti-affinity rules are configured, multiple labels will be used to filter pods that require anti-affinity, and the newly created pods must be anti-affinity with all pods that meet the label filtering conditions. In this way, all the topology keys where the pods that meet the label filtering conditions locate will not be scheduled.

      Preferred

      Soft constraint, which corresponds to preferredDuringSchedulingIgnoredDuringExecution in YAML for specifying the conditions that need to be met as much as possible.

      Select one or more pods that require anti-affinity by label. If such pods already run on a node in the topology key, the scheduler will preferentially schedule the created pods to other topology keys.

      NOTE:

      If multiple anti-affinity rules are configured, multiple labels will be used to filter pods that require anti-affinity, and the newly created pods will be preferentially to be anti-affinity with multiple pods that meet the label filtering conditions. However, even if all topology keys involve the pods that require anti-affinity, a topology key will be selected for scheduling.

      Table 2 Parameters for configuring load affinity/anti-affinity scheduling policies

      Parameter

      Description

      Weight

      This parameter is available only in a preferred scheduling policy. Weights range from 1 to 100 and are taken into account as an extra scoring factor during scheduling. The scheduler combines the weight with other priority functions of the node to determine the final score and then assigns pods to the node with the highest total score.

      Namespace

      Namespace for which the scheduling policy takes effect.

      Topology Key

      A topology key (topologyKey) determines the range of nodes to be scheduled based on node labels, identifies affinity/anti-affinity objects based on the labels and operators, and performs scheduling based on the topology key where the target object is located.

      • For example, if the node label is kubernetes.io/hostname, the label value will be a node name. Nodes with different names are assigned to different topology keys. This allows for workload affinity scheduling on a single node, as each topology key contains only one node.
      • If the specified label is kubernetes.io/os, the label value will be a node OS. Nodes running different OSs are assigned to different topology keys. This allows for workload affinity scheduling on multiple nodes, as each topology key contains multiple nodes.

        For example, if pods that meet the load affinity rule are running on a node in a topology key, all nodes in the topology key can be scheduled.

      Label Key

      When configuring a workload affinity or anti-affinity policy, enter the workload label to be matched.

      Both default labels and custom labels are supported.

      Operator

      The following operators are supported:

      • In: The label of the affinity or anti-affinity object is in the label value list (values field).
      • NotIn: The label of the affinity or anti-affinity object is not in the label value list (values field).
      • Exists: The affinity or anti-affinity object has a specified label key.
      • DoesNotExist: The affinity or anti-affinity object does not have a specified label key.

      Label Value

      When configuring a workload affinity or anti-affinity policy, enter the value of the workload label.

  3. After the scheduling policy is added, click Create Workload.

Configuring Load Affinity/Anti-affinity Using YAML

  • Workload affinity

    Kubernetes supports affinity between pods, which allows the frontend and backend pods of an application to be deployed together to minimize access latency.

    Assume that the backend pods of an application have been created with label app=backend. You can use .spec.affinity.podAffinity to configure workload affinity so that the frontend pods (labeled app=frontend) and backend pods (labeled app=backend) can be deployed together.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: frontend
      labels:
        app: frontend
    spec:
      selector:
        matchLabels:
          app: frontend
      replicas: 3
      template:
        metadata:
          labels:
            app: frontend
        spec:
          containers:
          - image: nginx:alpine
            name: frontend
            resources:
              requests:
                cpu: 100m
                memory: 200Mi
              limits:
                cpu: 100m
                memory: 200Mi
          imagePullSecrets:
          - name: default-secret
          affinity:  # Configure a scheduling policy.
            podAffinity:  # Workload affinity scheduling rule
              requiredDuringSchedulingIgnoredDuringExecution:   # Scheduling policy that must be met
              - topologyKey: prefer    # Topology keys are divided based on node labels, among which prefer is a custom label.
                labelSelector:  # Select workloads that meet the requirements based on workload labels.
                  matchExpressions: # Workload label matching rule
                  - key: app # The key of the workload label is app.
                    operator: In # The rule is met if a value exists in the value list.
                    values: # Workload label values
                    - backend
              preferredDuringSchedulingIgnoredDuringExecution:    # Scheduling policy that is met as much as possible
              - weight: 100  # Priority that can be configured when the best-effort policy is used. The value ranges from 1 to 100. A larger value indicates a higher priority.
                podAffinityTerm:  # Affinity configuration when the best-effort policy is used
                  topologyKey: topology.kubernetes.io/zone   # Topology keys are divided based on node labels by node AZ.
                  labelSelector:
                    matchExpressions:
                    - key: app
                      operator: In
                      values:
                      - backend

    During workload scheduling in the preceding example, node topology keys are divided based on the prefer label using the rule that must be met. If backend pods (labeled app=backend) are running on a node in the topology key, frontend pods (labeled app=frontend) will also be deployed in that topology key, even if not all nodes in the topology key are running the backend pods. According to the best-effort rule, topology keys are divided based on topology.kubernetes.io/zone by node AZ. This ensures that the frontend and backend pods are deployed on nodes within the same AZ as much as possible.

    For workload affinity, topologyKey cannot be left blank when requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution are used.

    topologyKey is used to divide topology keys based on the labels of nodes. Nodes with the same labels are grouped into the same topology key. The scheduler then selects the topology key to be scheduled based on the workload label. A topology key can consist of multiple nodes. If a workload that meets a label selection rule runs on a node in a topology key, all nodes in the topology key can be scheduled.

    For example, if the topologyKey label is set to topology.kubernetes.io/zone, nodes' AZs will be used as the topology keys, and workloads will be scheduled by AZ during deployment.

  • Workload anti-affinity

    In some cases, pods need to be deployed separately. This is because deploying them together can negatively impact performance.

    Assume that the frontend pods of an application have been created with label app=frontend. To ensure that pods are deployed on different nodes and multiple AZs are preferred, you can use .spec.affinity.podAntiAffinity to configure workload anti-affinity.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name:   frontend
      labels:
        app:  frontend
    spec:
      selector:
        matchLabels:
          app: frontend
      replicas: 5
      template:
        metadata:
          labels:
            app:  frontend
        spec:
          containers:
          - image:  nginx:alpine
            name:  frontend
            resources:
              requests:
                cpu:  100m
                memory:  200Mi
              limits:
                cpu:  100m
                memory:  200Mi
          imagePullSecrets:
          - name: default-secret
          affinity:
            podAntiAffinity:  # Workload anti-affinity scheduling rule
              requiredDuringSchedulingIgnoredDuringExecution:   # Scheduling policy that must be met
              - topologyKey: kubernetes.io/hostname    # Topology keys are divided based on node labels.
                labelSelector:    # Pod label matching rule
                  matchExpressions:  # The key of the workload label is app.
                  - key: app  # The key of the workload label is app.
                    operator: In  # The rule is met if a value exists in the value list.
                    values:  # Workload label values
                    - frontend
              preferredDuringSchedulingIgnoredDuringExecution:    # Scheduling policy that is met as much as possible
              - weight: 100  # Priority that can be configured when the best-effort policy is used. The value ranges from 1 to 100. A larger value indicates a higher priority.
                podAffinityTerm:  # Affinity configuration when the best-effort policy is used
                  topologyKey: topology.kubernetes.io/zone   # Topology keys are divided based on node labels.
                  labelSelector:
                    matchExpressions:
                    - key: app
                      operator: In
                      values:
                      - frontend

    In the preceding example, anti-affinity rules are configured. The rule that must be met indicates that node topology keys are divided based on kubernetes.io/hostname. Nodes with the kubernetes.io/hostname label have different label values. Therefore, there is only one node in each topology key. If a topology key contains only one node where a frontend pod already exists, pods with the same label will not be scheduled to that topology key. According to the best-effort rule, topology keys are divided based on topology.kubernetes.io/zone by node AZ. This ensures that the pods are deployed on nodes in different AZs as much as possible.

    For workload anti-affinity, when requiredDuringSchedulingIgnoredDuringExecution is used, the default access controller LimitPodHardAntiAffinityTopology of Kubernetes requires that topologyKey can only be kubernetes.io/hostname. To use other custom topology logic, modify or disable the access controller.