Help Center/ Cloud Container Engine/ Best Practices/ Container/ Recommended Configurations for Workloads
Updated on 2025-01-08 GMT+08:00

Recommended Configurations for Workloads

When deploying a workload in a CCE cluster, you need to configure the workload based on the actual service scenarios and environments to ensure that the workload can run stably and reliably. This section provides some recommended configurations and suggestions for workload deployment.

Specifying Pod Resources (Requests and Limits)

Requests and limits need to be configured based on the actual service scenarios. The requests are used for the scheduler to check available resources and record the allocated resources on each node. The allocated resources on a node are the sum of container requests defined in all pods on the node. You can calculate the available resources on a node using the following formula: Available resources on a node = Total resources on the node – Allocated resources on the node. If there are not enough available resources on a node to accommodate a pod's requests, the pod will not be scheduled on that node.

If the requests are not configured, the scheduler cannot determine the resource usage on a node and cannot schedule pods to suitable nodes. This can lead to a situation where a node becomes overloaded with a large number of pods, potentially causing issues with the node and impacting the actual services. It is recommended that you configure requests for all containers so that the scheduler can accurately monitor the resource usage on nodes and make appropriate scheduling decisions.

The following shows an example of how to configure the request and limit for an Nginx pod. The request specifies that the pod requires 0.5 CPU cores and 128 MiB of memory. During running, the pod can use resources beyond the request, but it cannot exceed the resource limit of 1 CPU core and 256 MiB of memory.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
spec:
  containers:
  - name: container-1
    image: nginx
    resources: # Resource declaration
      resources:
        limits:
          cpu: 1000m
          memory: 256Mi
        requests:
          cpu: 500m
          memory: 128Mi
  imagePullSecrets:
    - name: default-secret

Configuring a Graceful Exit Period

The graceful exit period (terminationGracePeriodSeconds) is the time between a failed pod triggering the termination process and the pod being forcefully stopped. By default, if this parameter is not specified, the grace period is set to 30 seconds, with a minimum value of 1 second. During this grace period, the pod can be gracefully shut down, allowing it to perform operations such as saving its status, completing ongoing tasks, and closing network connections. It is crucial to configure terminationGracePeriodSeconds properly to ensure a smooth, orderly termination of an application.

If you want a pod to wait for 60 seconds before termination, allowing the pod to be properly cleared, you can include the following parameters in the pod definition:

kind: Deployment
apiVersion: apps/v1
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
      version: v1
  template:
    metadata:
      labels:
        app: nginx
        version: v1
    spec:
      containers:
        - name: container-1
          image: nginx
      imagePullSecrets:
        - name: default-secret
      terminationGracePeriodSeconds: 60

Configuring Tolerations

Tolerations allow pods to be scheduled on nodes even if there are taints present. For example, if an application heavily relies on the local state of a node, you may want it to remain on that node for an extended period during a network partition, waiting for the network to recover and avoiding eviction.

Sometimes, the Kubernetes node controller automatically adds taints to nodes. It is recommended that you add tolerations for the built-in taints node.kubernetes.io/not-ready (indicating the node is not ready) and node.kubernetes.io/unreachable (indicating the node controller cannot access the node). In the following example, a node has added the preceding tolerations, and the pod will continue running on the node for 300 seconds before being evicted.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
spec:
  containers:
    - name: container-1
      image: nginx
  imagePullSecrets:
    - name: default-secret
  tolerations:
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300

Configuring a Rolling Update

In Kubernetes, the strategy field in a workload determines how resources like Deployments, StatefulSets, and DaemonSets are updated. To maintain service continuity during a workload upgrade, you can use rolling updates to control the number of available pods, minimizing downtime. For example, in a Deployment with multiple pods, you can specify the maximum number of old pods that can be unavailable and the maximum number of new pods that can be started and running until the update is complete. Rolling updates ensure service stability and availability while smoothly transitioning applications to new versions.

In the following example, a rolling update policy is configured, where both maxUnavailable and maxSurge are set to 25%. This means that up to 25% of old pods can be unavailable and up to 25% of new pods can be started during the update.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: nginx
spec:
  replicas: 10
  selector:
    matchLabels:
      app: nginx
      version: v1
  template:
    metadata:
      labels:
        app: nginx
        version: v1
    spec:
      containers:
        - name: container-1
          image: nginx
      imagePullSecrets:
        - name: default-secret
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%

Configuring a Restart Policy

The restartPolicy parameter is used to define the behavior after a pod terminates. You can customize the policy based on your specific services to enable automatic restarts when a pod exits.

The following is an example of configuring an Nginx pod to always restart automatically:

apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
spec:
  containers:
  - name: nginx
    image: nginx
    restartPolicy: Always 
  imagePullSecrets:
    - name: default-secret
The options of restartPolicy include:
  • Always: The pod always restarts automatically after any termination.
  • OnFailure: The pod automatically restarts if it exits with an error. (The process exit status is not 0).
  • Never: The pod never restarts.

Configuring a Liveness Probe and a Readiness Probe

A liveness probe checks whether a pod is normal. In Kubernetes, if a pod is in the Running state, it does not mean that the pod can provide services properly. The pod may fail to provide services due to problems in processes such as deadlock. You can configure a liveness probe to avoid similar problems and restart the pod in a timely manner to restore your service.

A readiness probe detects whether a pod is ready to receive Service requests. If the pod is faulty, the readiness probe avoids forwarding new traffic to the pod.

apiVersion: v1
kind: Pod
metadata:
  name: tomcat
spec:
  containers:
  - name: tomcat
    image: tomcat
    livenessProbe:
      httpGet:
        path: /index.jsp
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 3
    readinessProbe:
      httpGet:
        path: /index.jsp
        port: 8080
  imagePullSecrets:
    - name: default-secret