Updated on 2024-01-04 GMT+08:00

Cluster Configuration Management

Scenario

CCE allows you to manage cluster parameters, through which you can let core components work under your very requirements.

Constraints

This function is supported only in clusters of v1.15 and later. It is not displayed for versions earlier than v1.15.

Procedure

  1. Log in to the CCE console. In the navigation pane, choose Clusters.
  2. Locate the target cluster, click ... to view more operations on the cluster, and choose Manage.
  3. On the Manage Components page on the right, change the values of the Kubernetes parameters listed in the following table.

    Table 1 kube-apiserver configuration

    Item

    Parameter

    Description

    Value

    Tolerance time of pods for an unavailable node

    default-not-ready-toleration-seconds

    Specifies the default tolerance time. The configuration takes effect for all pods by default. You can configure different tolerance time for pods. In this case, the tolerance time configured for the pod is used. For details, see Taints and Tolerations.

    If the specified tolerance time is too short, pods may be frequently migrated in scenarios like a network jitter. If the specified tolerance time is too long, services may be interrupted during this period after the node is faulty.

    Default: 300s

    Tolerance time of pods for an inaccessible node

    default-unreachable-toleration-seconds

    Specifies the default tolerance time. The configuration takes effect for all pods by default. You can configure different tolerance time for pods. In this case, the tolerance time configured for the pod is used. For details, see Taints and Tolerations.

    If the specified tolerance time is too short, pods may be frequently migrated in scenarios like a network jitter. If the specified tolerance time is too long, services may be interrupted during this period after the node is faulty.

    Default: 300s

    Maximum number of concurrent modification API requests

    max-mutating-requests-inflight

    Maximum number of concurrent mutating requests. When the value of this parameter is exceeded, the server rejects requests.

    The value 0 indicates that there is no limitation on the maximum number of concurrent modification requests. This parameter is related to the cluster scale. You are advised not to change the value.

    Manual configuration is no longer supported since cluster v1.21. The value is automatically specified based on the cluster scale.

    • 200 for clusters with 50 or 200 nodes
    • 500 for clusters with 1000 nodes
    • 1000 for clusters with 2000 nodes

    Maximum number of concurrent non-modification API requests

    max-requests-inflight

    Maximum number of concurrent non-mutating requests. When the value of this parameter is exceeded, the server rejects requests.

    The value 0 indicates that there is no limitation on the maximum number of concurrent non-modification requests. This parameter is related to the cluster scale. You are advised not to change the value.

    Manual configuration is no longer supported since cluster v1.21. The value is automatically specified based on the cluster scale.

    • 400 for clusters with 50 or 200 nodes
    • 1000 for clusters with 1000 nodes
    • 2000 for clusters with 2000 nodes

    Ports used by NodePort services

    service-node-port-range

    NodePort port range. After changing the value, go to the security group page and change the TCP/UDP port range of node security groups 30000 to 32767. Otherwise, ports other than the default port cannot be accessed externally.

    If the port number is smaller than 20106, a conflict may occur between the port and the CCE health check port, which may further lead to unavailable cluster. If the port number is greater than 32767, a conflict may occur between the port and the ports in net.ipv4.ip_local_port_range, which may further affect the network performance.

    Default:

    From 30000 to 32767

    Value range:

    Min > 20105

    Max < 32768

    Request timeout

    request-timeout

    Default request timeout interval of kube-apiserver. Exercise caution when changing the value of this parameter. Ensure that the changed value is proper to prevent frequent API timeout or other errors.

    This parameter is supported only by clusters of v1.19.16-r30, v1.21.10-r10, v1.23.8-r10, v1.25.3-r10, and later versions.

    Default:

    1m0s

    Value range:

    Min ≥ 1s

    Max ≤ 1 hour

    Overload control

    support-overload

    Cluster overload control. If enabled, concurrent requests are dynamically controlled based on the resource pressure of master nodes to keep them and the cluster available.

    This parameter is supported only by clusters of v1.23 or later.

    • false: Overload control is disabled.
    • true: Overload control is enabled.
    Table 2 Scheduler configurations

    Item

    Parameter

    Description

    Value

    Query per second (QPS) for the scheduler to access kube-apiserver

    kube-api-qps

    QPS for communicating with kube-apiserver.

    • If the number of nodes in a cluster is less than 1000, the default value is 100.
    • If a cluster contains 1000 or more nodes, the default value is 200.

    Burst for the scheduler to access kube-apiserver

    kube-api-burst

    Burst to use while talking with kube-apiserver.

    • If the number of nodes in a cluster is less than 1000, the default value is 100.
    • If a cluster contains 1000 or more nodes, the default value is 200.

    GPU sharing

    enable-gpu-share

    Whether to enable GPU sharing. This parameter is supported only by clusters of v1.23.7-r10, v1.25.3-r0, and later.

    • When disabled, ensure that pods in the cluster do not use the shared GPU (that is, the annotation of cce.io/gpu-decision does not exist in pods).
    • When enabled, ensure that the annotation of cce.io/gpu-decision exists in pods that use GPU resources in the cluster.

    Default value: true

    Table 3 kube-controller-manager configurations

    Item

    Parameter

    Description

    Value

    Deployment

    concurrent-deployment-syncs

    Number of deployment objects that are allowed to sync concurrently

    Default: 5

    Endpoint

    concurrent-endpoint-syncs

    Number of endpoint syncing operations that will be done concurrently

    Default: 5

    Garbage collector

    concurrent-gc-syncs

    Number of garbage collector workers that are allowed to sync concurrently

    Default: 20

    Job

    concurrent-job-syncs

    Number of job objects that are allowed to sync concurrently

    Default: 5

    Namespace

    concurrent-namespace-syncs

    Number of namespace objects that are allowed to sync concurrently

    Default: 10

    ReplicaSet

    concurrent-replicaset-syncs

    Number of replica sets that are allowed to sync concurrently

    Default: 5

    RsourceQuota

    concurrent-resource-quota-syncs

    Number of resource quotas that are allowed to sync concurrently

    Default: 5

    Servicepace

    concurrent-service-syncs

    Number of services that are allowed to sync concurrently

    Default: 10

    ServiceAccountToken

    concurrent-serviceaccount-token-syncs

    Number of service account token objects that are allowed to sync concurrently

    Default: 5

    TTLAfterFinished

    concurrent-ttl-after-finished-syncs

    Number of ttl-after-finished-controller workers that are allowed to sync concurrently

    Default: 5

    RC

    concurrent-rc-syncs

    Number of replication controllers that are allowed to sync concurrently

    NOTE:

    This parameter is used only in clusters of v1.21 to v1.23. In clusters of v1.25 and later, this parameter is deprecated (officially deprecated from v1.25.3-r0 on).

    Default: 5

    Period for syncing the number of pods in horizontal pod autoscaler

    horizontal-pod-autoscaler-sync-period

    How often HPA audits metrics in a cluster.

    Default: 15 seconds

    QPS for the controller to access kube-apiserver

    kube-api-qps

    QPS to use while talking with kube-apiserver

    • If the number of nodes in a cluster is less than 1000, the default value is 100.
    • If a cluster contains 1000 or more nodes, the default value is 200.

    Burst for the controller to communicate with kube-apiserver

    kube-api-burst

    Burst to use while talking with kube-apiserver.

    • If the number of nodes in a cluster is less than 1000, the default value is 100.
    • If a cluster contains 1000 or more nodes, the default value is 200.

    Threshold for triggering garbage collection of terminated pods

    terminated-pod-gc-threshold

    Number of terminated pods that can exist in a cluster. If there are more terminated pods than the expected number in the cluster, the terminated pods that exceed the number will be deleted.

    Default: 1000

    Value range: 10 to 12500

    HPA

    concurrent-horizontal-pod-autoscaler-syncs

    Number of HPA auto scaling requests that can be concurrently processed. This parameter is available only in clusters of v1.27 or later.

    Default: 5

    Value range: 1 to 50

    Table 4 Network component configurations (supported only by CCE Turbo clusters)

    Item

    Parameter

    Description

    Value

    Minimum number of ENIs bound to a node at the cluster level

    nic-minimum-target

    Minimum number of container ENIs bound to a node

    The parameter value must be a positive integer. The value 10 indicates that there are at least 10 container ENIs bound to a node. If the number you entered exceeds the container ENI quota of the node, the ENI quota will be used.

    Default: 10

    Maximum number of ENIs pre-bound to a node at the cluster level

    nic-maximum-target

    If the number of ENIs bound to a node exceeds the value of nic-maximum-target, the system does not proactively pre-bind ENIs.

    Checking the upper limit of pre-bound container ENIs is enabled only when the value of this parameter is at least equal to the minimum number of container ENIs (nic-minimum-target) bound to a node.

    The parameter value must be a positive integer. The value 0 indicates that the check on the upper limit of pre-bound container ENIs is disabled. If the number you entered exceeds the container ENI quota of the node, the ENI quota will be used.

    Default: 0

    Number of ENIs pre-bound to a node at the cluster level

    nic-warm-target

    Extra ENIs will be pre-bound after the nic-minimum-target is used up in a pod. The value can only be a number.

    When the value of nic-warm-target + the number of bound ENIs is greater than the value of nic-maximum-target, the system will pre-bind ENIs based on the difference between the value of nic-maximum-target and the number of bound ENIs.

    Default: 2

    Reclaim number of ENIs pre-bound to a node at the cluster level

    nic-max-above-warm-target

    Only when the number of idle ENIs on a node minus the value of nic-warm-target is greater than the threshold, the pre-bound ENIs will be unbound and reclaimed. Only numbers are allowed.

    • A large value will accelerate pod startup but slow down the unbinding of idle container ENIs and decrease the IP address usage.
    • A small value will speed up the unbinding of idle container ENIs and increase the IP address usage but will slow down pod startup, especially when a large number of pods increase instantaneously.

    Default: 2

    Low threshold of the number of container ENIs bound to a node in a cluster

    prebound-subeni-percentage

    High threshold of the number of bound ENIs

    NOTE:

    This parameter is being discarded. Use the dynamic pre-binding parameters of the other four ENIs.

    Default: 0:0

    Table 5 Extended controller configurations (supported only by clusters of v1.21 and later)

    Item

    Parameter

    Description

    Value

    Resource quota management

    enable-resource-quota

    Indicates whether to automatically create a ResourceQuota when creating a namespace. With quota management, you can control the number of workloads of each type and the upper limits of resources in a namespace or related dimensions.

    • false: no auto creation
    • true: auto creation enabled. For details about the resource quota defaults, see Configuring Resource Quotas.
      NOTE:

      In high-concurrency scenarios (for example, creating pods in batches), the resource quota management may cause some requests to fail due to conflicts. Do not enable this function unless necessary. To enable this function, ensure that there is a retry mechanism in the request client.

    Default: false

  4. Click OK.