Help Center/ Cloud Container Engine/ FAQs/ Node/ OSs/ What Can I Do If cgroup kmem Leakage Occasionally Occurs When an Application Is Repeatedly Created or Deleted on a Node Running CentOS with an Earlier Kernel Version?
Updated on 2024-07-04 GMT+08:00

What Can I Do If cgroup kmem Leakage Occasionally Occurs When an Application Is Repeatedly Created or Deleted on a Node Running CentOS with an Earlier Kernel Version?

Symptom

When an application is repeatedly created on a node running CentOS 7.6 with a kernel version earlier than 3.10.0-1062.12.1.el7.x86_64, (Such nodes mainly run in clusters 1.17.9.) cgroup kmem leakage occurs. As a result, although there is available memory on the node, new pods still cannot be added to it, and the error message "Cannot allocate memory" displays.

Possible Causes

A temporary memory cgroup is created along with the creation of the application. When the application is deleted, the cgroup (the corresponding cgroup directory in /sys/fs/cgroup/memory) has already been deleted from the kernel. But in the kernel, cssid is not released, which results in the number of cgroups considered by the kernel is different from the actual number. When the number of residual cgroups exhausts the limit on the node, pods cannot be added to the node.

Solution

  • Use the cgroup.memory=nokmem parameter globally at the kernel to disable kmem to prevent leakage.
  • Clusters of v1.17 are no longer maintained. To resolve this problem, upgrade the cluster to v1.19 or later and reset the OS of the node to the latest version. Ensure that the kernel version is later than 3.10.0-1062.12.1.el7.x86_64.