What Can I Do If cgroup kmem Leakage Occasionally Occurs When an Application Is Repeatedly Created or Deleted on a Node Running CentOS with an Earlier Kernel Version?
Symptom
When an application is repeatedly created on a node running CentOS 7.6 with a kernel version earlier than 3.10.0-1062.12.1.el7.x86_64, (Such nodes mainly run in clusters 1.17.9.) cgroup kmem leakage occurs. As a result, although there is available memory on the node, new pods still cannot be added to it, and the error message "Cannot allocate memory" displays.
Possible Causes
A temporary memory cgroup is created along with the creation of the application. When the application is deleted, the cgroup (the corresponding cgroup directory in /sys/fs/cgroup/memory) has already been deleted from the kernel. But in the kernel, cssid is not released, which results in the number of cgroups considered by the kernel is different from the actual number. When the number of residual cgroups exhausts the limit on the node, pods cannot be added to the node.
Solution
- Use the cgroup.memory=nokmem parameter globally at the kernel to disable kmem to prevent leakage.
- Clusters of v1.17 are no longer maintained. To resolve this problem, upgrade the cluster to v1.19 or later and reset the OS of the node to the latest version. Ensure that the kernel version is later than 3.10.0-1062.12.1.el7.x86_64.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot