Updated on 2024-08-17 GMT+08:00

Kubernetes

Typical native configuration items are provided. You can configure native community management components such as kube-apiserver and kube-controller for the best cloud native experience.

API Server Configuration (kube-apiserver)

Container eviction configuration

By default, the default tolerance time applies to all containers in a cluster. You can also configure different tolerance times for pods. In this case, your custom settings take effect.

It is recommended that you properly configure the tolerance times for pods, or certain problems may occur.

  • If the parameter is set to a low value, containers may be moved frequently during brief fault scenarios like network jitter, which can impact services.
  • If the parameter is set to a high value, containers may not be moved for a long period of time in the event of a node failure, which can impact services.
Table 1 Parameters

Item

Parameter

Description

Value

Toleration time for nodes in NotReady state

default-not-ready-toleration-seconds

Tolerance time when a node is not ready. If a node becomes unavailable, pods running on the node are evicted automatically after the tolerance time elapses. The default value is 300s.

Default: 300s

Toleration time for nodes in unreachable state

default-unreachable-toleration-seconds

Tolerance time when a node is unreachable. If the environment is abnormal, for example, a node cannot be accessed (due to reasons such as abnormal node network), pods running on the node are evicted automatically after the tolerance time elapses. The default value is 300s.

Default: 300s

Admission Controller Add-on Configurations

With Kubernetes, you can enable admission add-ons to limit and manage Kubernetes API objects (like pods, Services, and Deployments) prior to modifying them within a cluster.

This parameter is available only in clusters of v1.23.14-r0, v1.25.9-r0, v1.27.6-r0, v1.28.4-r0, or later versions.

Table 2 Parameters

Item

Parameter

Description

Value

Node Restriction Add-on

enable-admission-plugin-node-restriction

This add-on allows the kubelet of a node to operate only the objects of the current node for enhanced isolation in multi-tenant scenarios or the scenarios with high security requirements. For details, see the official documentation.

Enable/Disable

Pod Node Selector Add-on

enable-admission-plugin-pod-node-selector

This add-on allows cluster administrators to configure the default node selector through namespace annotations. In this way, pods run only on specific nodes and configurations are simplified.

Enable/Disable

Pod Toleration Limit Add-on

enable-admission-plugin-pod-toleration-restriction

This add-on allows cluster administrators to configure the default value and limits of pod tolerations through namespaces for fine-grained control over pod scheduling and key resource protection.

Enable/Disable

Service Account Token Volume Projection

Kubelet can project a service account token into a pod. You can specify the desired properties of the token, such as the API audiences. The token will become invalid against the API when either the pod or the service account is deleted. For details, see the official document.

This parameter is available only in clusters of v1.23.16-r0, v1.25.11-r0, v1.27.8-r0, v1.28.6-r0, v1.29.2-r0, or later versions.

Table 3 Parameters

Item

Parameter

Description

Value

API Audience Settings

api-audiences

Audiences for a service account token. The Kubernetes component for authenticating service account tokens checks whether the token used in an API request specifies authorized audiences.

Configuration suggestion: Accurately configure audiences according to the communication needs among cluster services. By doing so, the service account token is used for authentication only between authorized services, which enhances security.

NOTE:

An incorrect configuration may lead to an authentication communication failure between services or an error during token verification.

Default value: "https://kubernetes.default.svc.cluster.local"

Multiple values can be configured, which are separated by commas (,).

Service Account Token Issuer Identity

service-account-issuer

Entity identifier for issuing a service account token, which is the value identified by the iss field in the payload of the service account token.

Configuration suggestion: Ensure the configured issuer URL can be accessed in the cluster and trusted by the authentication system in the cluster.

NOTE:

If your specified issuer URL is untrusted or inaccessible, the authentication process based on the service account may fail.

Default value: "https://kubernetes.default.svc.cluster.local"

Multiple values can be configured, which are separated by commas (,).

Controller Configuration (kube-controller-manager)

Common Configurations of the Controller

  • Controller performance configuration: used to configure performance parameters for the controller to access kube-apiserver.

    It is recommended that you properly configure the controller performance settings, or certain problems may occur.

    • If a parameter is set to a small value, client traffic limiting may be triggered, affecting controller performance.
    • If a parameter is set to a large value, kube-apiserver may be overloaded.
    Table 4 Parameters

    Item

    Parameter

    Description

    Value

    QPS for communicating with kube-apiserver

    kube-api-qps

    QPS for communication with kube-apiserver

    • If the number of nodes in a cluster is less than 1,000, the default value is 100.
    • If the number of nodes in a cluster is 1,000 or more, the default value is 200.

    Burst for communicating with kube-apiserver

    kube-api-burst

    Burst for communication with kube-apiserver

    • If the number of nodes in a cluster is less than 1,000, the default value is 100.
    • If the number of nodes in a cluster is 1,000 or more, the default value is 200.
  • Cluster controller concurrent configuration: specifies the number of resource objects that are allowed to synchronize simultaneously. A larger value indicates a quicker response and higher CPU (and network) load.

    It is recommended that you properly configure the controller concurrency, or certain problems may occur.

    • If a parameter is set to a small value, the controller may respond slowly.
    • If a parameter is set to a large value, the cluster management plane will be overloaded.
    Table 5 Parameters

    Item

    Parameter

    Description

    Value

    Number of concurrent processing of deployment

    concurrent-deployment-syncs

    Number of deployment objects that can be synchronized concurrently. A larger value indicates a quicker response to Deployments and higher CPU (and network bandwidth) pressure.

    Default: 5

    Concurrent processing number of endpoint

    concurrent-endpoint-syncs

    Number of endpoints that can be concurrently synchronized. A larger value indicates faster update of endpoints and higher CPU (and network) pressure.

    Default: 5

    Concurrent number of garbage collector

    concurrent-gc-syncs

    Number of garbage collector workers that are allowed to synchronize concurrently.

    Default: 20

    Number of job objects allowed to sync simultaneously

    concurrent-job-syncs

    Number of job objects that can be synchronized concurrently. A larger value indicates a quicker response to jobs and higher CPU (and network) usage.

    Default: 5

    Number of CronJob objects allowed to sync simultaneously

    concurrent-cron-job-syncs

    Number of CronJob objects that can be synchronized concurrently. A larger value indicates a quicker response to CronJobs and higher CPU (and network) usage.

    Default: 5

    Number of concurrent processing of namespace

    concurrent-namespace-syncs

    Number of namespace objects that can be synchronized concurrently. A larger value indicates a quicker response to namespaces and higher CPU (and network) usage.

    Default: 10

    Concurrent processing number of replicaset

    concurrent-replicaset-syncs

    Number of ReplicaSet objects that can be synchronized concurrently. A larger value indicates a quicker response to ReplicaSet management and higher CPU (and network) usage.

    Default: 5

    Number of concurrent processing of resource quota

    concurrent-resource-quota-syncs

    Number of ResourceQuota objects that can be synchronized concurrently. A larger value indicates a faster response to quota management and higher CPU (and network) usage.

    Default: 5

    Servicepace

    concurrent-service-syncs

    Number of Service objects that can be synchronized concurrently. A larger value indicates a faster response to Service management and higher CPU (and network) usage.

    Default: 10

    Concurrent processing number of serviceaccount-token

    concurrent-serviceaccount-token-syncs

    Number of service account token objects that can be synchronized concurrently. A larger value indicates faster token generation and higher CPU (and network) usage.

    Default: 5

    Concurrent processing of ttl-after-finished

    concurrent-ttl-after-finished-syncs

    Number of ttl-after-finished-controller workers that can be synchronized concurrently.

    Default: 5

    RC

    concurrent_rc_syncs

    Number of replication controllers that can be synchronized concurrently. A larger value indicates faster replica management operations and higher CPU (and network) usage.

    NOTE:

    This parameter is used only in clusters of v1.19 or earlier.

    Default: 5

    RC

    concurrent-rc-syncs

    Number of replication controllers that can be synchronized concurrently. A larger value indicates faster replica management operations and higher CPU (and network) usage.

    NOTE:

    This parameter is used only in clusters of v1.21 to v1.23. In clusters of v1.25 and later, this parameter is deprecated (officially deprecated from v1.25.3-r0 on).

    Default: 5

    HPA

    concurrent-horizontal-pod-autoscaler-syncs

    Maximum number of HPA auto scaling requests that can be processed concurrently. A larger value indicates a faster HPA auto scaling and higher CPU (and network) usage.

    This parameter is available only in clusters of v1.27 or later.

    Default: 5

    Value range: 1 to 50

Node lifecycle controller (node-lifecycle-controller) configuration

This parameter is available only in clusters of v1.23.14-r0, v1.25.9-r0, v1.27.6-r0, v1.28.4-r0, or later versions.

Table 6 Parameters

Item

Parameter

Description

Value

Unhealthy AZ Threshold

unhealthy-zone-threshold

When more than a certain proportion of pods in an AZ are unhealthy, the AZ itself will be considered unhealthy, and scheduling pods to nodes in that AZ will be restricted to limit the impacts of the unhealthy AZ.

NOTE:

If the parameter is set to a large value, pods in unhealthy AZs will be migrated in a large scale, which may lead to risks such as overloaded clusters.

Default: 0.55

Node Eviction Rate

node-eviction-rate

This parameter specifies the number of nodes that pods are deleted from per second in a cluster when the AZ is healthy. The default value is 0.1, indicating that pods can be evicted from at most one node every 10 seconds.

NOTE:

Configure this parameter based on the size of the cluster. The number of pods to be evicted in each batch should not exceed 300.

If the parameter is set to a large value, the cluster may be overloaded. Additionally, if too many pods are evicted, they cannot be rescheduled, which will slow down fault recovery.

Default: 0.1

Secondary Node Eviction Rate

secondary-node-eviction-rate

This parameter specifies the number of nodes that pods are deleted from per second in a cluster when the AZ is unhealthy. The default value is 0.01, indicating that pods can be evicted from at most one node every 100 seconds.

NOTE:

Configure this parameter with node-eviction-rate and set it to one-tenth of node-eviction-rate.

There is no need to set the parameter to a large value for nodes in an unhealthy AZ, and this configuration may result in overloaded clusters.

Default: 0.01

Large Cluster Threshold

large-cluster-size-threshold

If the number of nodes in a cluster is greater than the value of this parameter, this is a large cluster.

Configuration suggestion: For the clusters with a large number of nodes, configure a relatively larger value than the default one for higher performance and faster responses of controllers. Retain the default value for small clusters. Before adjusting the value of this parameter in a production environment, check the impact of the change on cluster performance in a test environment.

NOTE:

kube-controller-manager automatically adjusts configurations for large clusters to optimize the cluster performance. Therefore, an excessively small threshold for small clusters will deteriorate the cluster performance.

Default: 50

Load elastic scaling synchronization cycle

Table 7 Parameters

Item

Parameter

Description

Value

Cluster elastic computing period

horizontal-pod-autoscaler-sync-period

Period for the horizontal pod autoscaler to perform elastic scaling on pods. A smaller value will result in a faster auto scaling response and higher CPU load.

NOTE:

Make sure to configure this parameter properly as a lengthy period can cause the controller to respond slowly, while a short period may overload the cluster control plane.

Default: 15s

Horizontal Pod Scaling Tolerance

horizontal-pod-autoscaler-tolerance

The configuration determines how quickly the horizontal pod autoscaler will act to auto scaling policies. If the parameter is set to 0, auto scaling will be triggered immediately when the related metrics are met.

Configuration suggestion: If the service resource usage increases sharply over time, retain a certain tolerance to prevent auto scaling which is beyond expectation in high resource usage scenarios.

Default: 0.1

HPA CPU Initialization Period

horizontal-pod-autoscaler-cpu-initialization-period

During the period specified by this parameter, the CPU usage data used in HPA calculation is limited to pods that are both ready and have recently had their metrics collected. You can use this parameter to filter out unstable CPU usage data during the early stage of pod startup. This helps prevent incorrect scaling decisions based on momentary peak values.

Configuration suggestion: If you find that HPA is making incorrect scaling decisions due to CPU usage fluctuations during pod startup, increase the value of this parameter to allow for a buffer period of stable CPU usage.

NOTE:

Make sure to configure this parameter properly as a small value may trigger unnecessary scaling based on peak CPU usage, while a large value may cause scaling to be delayed.

This parameter is available only in clusters of v1.23.16-r0, v1.25.11-r0, v1.27.8-r0, v1.28.6-r0, v1.29.2-r0, or later versions.

Default: 5 minutes

HPA Initial Readiness Delay

horizontal-pod-autoscaler-initial-readiness-delay

After CPU initialization, this period allows HPA to use a less strict criterion for getting CPU metrics. During this period, HPA will gather data on the CPU usage of the pod for scaling, regardless of any changes in the pod's readiness status. This parameter ensures continuous tracking of CPU usage, even when the pod status changes frequently.

Configuration suggestion: If the readiness status of pods fluctuates after startup and you want to prevent HPA misjudgment caused by the fluctuation, increase the value of this parameter to allow HPA to gather more comprehensive CPU usage data.

NOTE:

Configure this parameter properly. If it is set to a small value, an unnecessary scale-out may occur due to CPU data fluctuations when the pod is just ready. If it is set to a large value, HPA may not be able to make a quick decision when a rapid response is needed.

This parameter is available only in clusters of v1.23.16-r0, v1.25.11-r0, v1.27.8-r0, v1.28.6-r0, v1.29.2-r0, or later versions.

Default: 30s

Threshold configuration of the number of terminal state pods that trigger recycling

Table 8 Parameters

Item

Parameter

Description

Value

The maximum number of terminated pods that can be kept before the Pod GC deletes the terminated pod

terminated-pod-gc-threshold

Number of terminated pods that can exist before the terminated pod garbage collector starts deleting terminated pods

NOTE:

It is recommended that you properly configure this parameter. If the value is too large, there may be a large number of terminated pods in the cluster, which will further affect the performance of list queries and result in an overloaded cluster.

Default: 1000

Value range: 10 to 12500

Resource quota controller (resource-quota-controller) configuration

In high-concurrency scenarios (for example, creating pods in batches), the resource quota management may cause some requests to fail due to conflicts. Do not enable this function unless necessary. To enable this function, ensure that there is a retry mechanism in the request client.

Table 9 Parameters

Item

Parameter

Description

Value

Enable resource quota management

enable-resource-quota

With resource quota management, you are allowed to control the number of workloads (such as Deployments and pods) and the upper limits of resources (such as CPUs and memory) in namespaces or related dimensions. Namespaces control quotas through the ResourceQuota objects.

  • false: Auto creation is disabled.
  • true: Auto creation is enabled. For details about the resource quota defaults, see Configuring Resource Quotas.

Default: false