Updated on 2024-11-12 GMT+08:00

OpenKruise

Introduction

OpenKruise is an extended component suite for Kubernetes. It leverages CRDs to offer advanced workload and application management features, including automatic deployment, release, O&M, and availability protection for cloud native applications. This simplifies and streamlines application management, making it more efficient.

OpenKruise has the following core capabilities:

  • Advanced workloads: It contains a set of advanced workloads, such as CloneSets and Advanced StatefulSets.
  • Application sidecar management: It provides many SidecarSets to make sidecar inject easier and offers other capabilities like in-place sidecar upgrade.
  • Application security protection: It protects your Kubernetes resources from being interfered by the cascading deletion mechanism.
  • Efficient application O&M: It provides many advanced O&M capabilities to help you better manage applications. For example, you can use the ImagePullJob CRD to pull some images from any nodes beforehand or restart containers in a running pod.

Open source community: https://github.com/openkruise/kruise

Notes and Constraints

If you have deployed the community OpenKruise in your cluster, uninstall it and then install the CCE OpenKruise add-on. Otherwise, the add-on may fail to be installed.

Precautions

OpenKruise has added webhooks to its open-source components. The default pod failure policy has been set to Fail by the community. This means that if kruise-controller-manager becomes unavailable, operations like pod creation and deletion will be blocked. Before using this add-on, it is important to carefully assess the risks and configure HA for kruise-controller-manager to ensure that the webhook server can handle requests properly.

OpenKruise is an open-source add-on that CCE has selected, adapted, and integrated into its services. CCE offers comprehensive technical support, but is not responsible for any service disruptions caused by defects in the open-source software, nor does it provide compensation or additional services for such disruptions. It is highly recommended that users regularly upgrade their software to address any potential issues.

Installing the Add-on

  1. Log in to the CCE console and click the cluster name to access the cluster console. In the navigation pane, choose Add-ons, locate OpenKruise on the right, and click Install.
  2. On the Install Add-on page, configure the specifications as needed.

    • If you selected Preset, you can choose between Small or Large based on the cluster scale. The system will automatically set the number of add-on pods and resource quotas according to the preset specifications. You can see the configurations on the console.

      The small specification specifies that the add-on runs in one pod, which is ideal for clusters with fewer than 50 nodes. The large specification specifies that the add-on runs in two pods, which are suitable for clusters with more than 50 nodes.

    • If you selected Custom, you can adjust the number of pods and resource quotas as needed. High availability is not possible with a single pod. If an error occurs on the node where the add-on instance runs, the add-on will fail.

  3. Check whether to enable Kruise-daemon.

    kruise-daemon, a new DaemonSet component, has been added to OpenKruise. It provides image warm-up and container restart.

    If you install OpenKruise v1.0.3 in a cluster of v1.25 or later, kruise-daemon cannot run on a Docker node. In this case, use a containerd node. For details, see Components.

  4. Configure deployment policies for the add-on pods.

    • Scheduling policies do not take effect on add-on instances of the DaemonSet type.
    • When configuring multi-AZ deployment or node affinity, ensure that there are nodes meeting the scheduling policy and that resources are sufficient in the cluster. Otherwise, the add-on cannot run.
    Table 1 Configurations for add-on scheduling

    Parameter

    Description

    Multi AZ

    • Preferred: Deployment pods of the add-on will be preferentially scheduled to nodes in different AZs. If all the nodes in the cluster are deployed in the same AZ, the pods will be scheduled to different nodes in that AZ.
    • Equivalent mode: Deployment pods of the add-on are evenly scheduled to the nodes in the cluster in each AZ. If a new AZ is added, you are advised to increase add-on pods for cross-AZ HA deployment. With the Equivalent multi-AZ deployment, the difference between the number of add-on pods in different AZs will be less than or equal to 1. If resources in one of the AZs are insufficient, pods cannot be scheduled to that AZ.
    • Required: Deployment pods of the add-on are forcibly scheduled to nodes in different AZs. There can be at most one pod in each AZ. If nodes in a cluster are not in different AZs, some add-on pods cannot run properly. If a node is faulty, add-on pods on it may fail to be migrated.

    Node Affinity

    • Not configured: Node affinity is disabled for the add-on.
    • Node Affinity: Specify the nodes where the add-on is deployed. If you do not specify the nodes, the add-on will be randomly scheduled based on the default cluster scheduling policy.
    • Specified Node Pool Scheduling: Specify the node pool where the add-on is deployed. If you do not specify the node pool, the add-on will be randomly scheduled based on the default cluster scheduling policy.
    • Custom Policies: Enter the labels of the nodes where the add-on is to be deployed for more flexible scheduling policies. If you do not specify node labels, the add-on will be randomly scheduled based on the default cluster scheduling policy.

      If multiple custom affinity policies are configured, ensure that there are nodes that meet all the affinity policies in the cluster. Otherwise, the add-on cannot run.

    Toleration

    Using both taints and tolerations allows (not forcibly) the add-on Deployment to be scheduled to a node with the matching taints, and controls the Deployment eviction policies after the node where the Deployment is located is tainted.

    The add-on adds the default tolerance policy for the node.kubernetes.io/not-ready and node.kubernetes.io/unreachable taints, respectively. The tolerance time window is 60s.

    For details, see Configuring Tolerance Policies.

  5. Click Install.

Components

Table 2 Add-on components

Component

Description

Resource Type

kruise-controller-manager

Core component of OpenKruise controller, which includes admission webhooks for Kruise CRDs and pods. kruise-controller-manager creates webhook configurations to configure which resources need to be processed and provides Services that can be called by kube-apiserver.

Deployment

kruise-daemon

Deployed on each node through DaemonSets to provide functions such as image warm-up and container restart.

DaemonSet

Since version 1.24, the Kubernetes community no longer supports Dockershim. CCE uses cri-dockerd as an alternative to Dockershim in clusters of v1.25 or later to accommodate users' Docker habits. However, the OpenKruise community does not support cri-dockerd. For details, see issue. This issue will be solved in later versions.

Therefore, if you install OpenKruise v1.0.3 in a cluster of v1.25 or later, kruise-daemon cannot run on a Docker node. In this case, use a containerd node.

Troubleshooting

When a workload is being created, the following error occurs:

Error creating: Internal error occurred: failed calling webhook "mpod.kb.io": failed to call webhook: Post "https://kruise-webhook-service.kube-system.svc:443/mutate-pod?timeout=10s": dial tcp 10.247.10.181:443: connect: connection refused

The issue is caused by the unavailability of the kruise-controller-manager component. This results in the interception of pod creation, update, and deletion operations in certain namespaces (excluding the kube-system namespace or namespaces without the control-plane: openkruise label).

Solution

Restore kruise-controller-manager. The causes and solutions are as follows:

  • The resources required by kruise-controller-manager are not enough for kruise-controller-manager to be properly scheduled. You are advised to configure more resources for the add-on.
  • A scheduling or affinity policy configured for kruise-controller-manager may prevent the pod from being scheduled correctly. You are advised to check the scheduling policy and configure a proper one to allow kruise-controller-manager to be scheduled smoothly.

Change History

Table 3 Release history

Add-on Version

Supported Cluster Version

New Feature

Community Version

1.0.12

v1.23

v1.25

v1.27

v1.28

v1.29

v1.30

CCE clusters 1.30 are supported.

1.5.4

1.0.3

v1.23

v1.25

v1.27

v1.28

v1.29

The OpenKruise add-on is now available.

1.5.4