Help Center/ Cloud Container Engine/ FAQs/ Node/ OSs/ When Container OOM Occurs on the CentOS Node with an Earlier Kernel Version, the Ext4 File System Is Occasionally Suspended
Updated on 2024-07-04 GMT+08:00

When Container OOM Occurs on the CentOS Node with an Earlier Kernel Version, the Ext4 File System Is Occasionally Suspended

Symptom

If the kernel version of a CentOS 7.6 node is earlier than 3.10.0-1160.66.1.el7.x86_64 and OOM occurs on containers on the node, all containers on the node may fail to be accessed, and processes such as Docker and jdb are in the D state. The fault is rectified after the node is restarted.

Possible Cause

When the memory usage of a service container exceeds its memory limit, cgroup OOM is triggered and the container is terminated by the system kernel. Container cgroup OOM occasionally triggers ext4 file system suspension on CentOS 7, and ext4/jbd2 is permanently suspended due to deadlock. All tasks that perform I/O operation on the file system are affected.

Solution

  • Temporary solution: Restart the node to temporarily rectify the fault.
  • Long-term solution:
    • If your cluster version is 1.19.16-r0, 1.21.7-r0, 1.23.5-r0, 1.25.1-r0, or later, reset the OS of the node to the latest version.
    • If your cluster version does not meet the requirements, upgrade the cluster to the specified version and then reset the node OS to the latest version.