Why Does Pod Fail to Write Data?

Pod Events

The file system of the node where the pod is located is damaged. As a result, the newly created pod cannot write data to /var/lib/kubelet/device-plugins/.xxxxx. Events similar to the following may occur in the pod:

Message: Pod Update Plugin resources failed due to failed to write checkpoint file "kubelet_internal_checkpoint": open /var/lib/kubelet/device-plugins/.xxxxxx: read-only file system, which is unexpected.

Click to enlarge

Such abnormal pods are recorded in error events but do not occupy system resources.

Procedure

There are many causes for file system exceptions, for example, the physical master node is powered on or off unexpectedly. If the file systems are not restored and a large number of pods becomes abnormal (which do not affect services), perform the following steps:

Run the kubectl drain <node-name> command to mark the node as unschedulable, and evict existing pods to other nodes.
```
kubectl drain <node-name>
```
Locate the cause of the file system exception and rectify the fault.
Run the following command to make the node schedulable:
```
kubectl uncordon <node-name>
```

Clearing Abnormal Pods

The garbage collection mechanism of kubelet is the same as that of the community. After the owner (for example, Deployment) of the pod is cleared, the abnormal pod is also cleared.
You can run the kubelet command to delete the pod recorded as abnormal.

Parent Topic: Workload Exception Troubleshooting