Why Does "Dead loop on virtual device gw_11cbf51a, fix it urgently" Intermittently Occur When I Log In to a VM using VNC?
Symptom
In a cluster that uses the VPC network model, the message "Dead loop on virtual device gw_11cbf51a, fix it urgently" is displayed after login to the VM.
Cause
The VPC network model uses the open-source Linux IPvlan module for container networking. In IPvlan L2E mode, layer-2 forwarding is preferentially performed, and then layer-3 forwarding.
Scene reproduction
Assume that there is a service pod A, which provides services externally and is constantly accessed by the node you log in to via the container gateway port through the host Kubernetes Service. Another scenario can be that pods on this node directly access each other. When pod A exits due to upgrade, scale-in, or other reasons, and the corresponding network resources are reclaimed, if the node still attempts to send packets to the IP address of pod A, the IPvlan module in the kernel first attempts to forward these packets at Layer 2 based on the destination IP address. However, as the NIC to which the pod A IP address belongs can no longer be found, the IPvlan module determines that the packet may be an external packet. Therefore, the module attempts to forward the packet at Layer 3 and matches the gateway port based on the routing rule. After the gateway port receives the packet again, it forwards the packet through the IPvlan module, and this process repeats. The dev_queue_xmit function in the kernel detects that the packet is repeatedly sent for 10 times. As a result, the packet is discarded and this log was generated.
After a packet is lost, the access initiator generally performs backoff retries for several times. Therefore, several logs are printed until the ARP in the container of the access initiator ages or the service terminates the access.
For communication between containers on different nodes, the destination and source IP addresses do not belong to the same node-level dedicated subnet (note that this subnet is different from the VPC subnet). Therefore, packets will not be repeatedly sent, and this problem will not occur.
Pods on different nodes in the same cluster can be accessed through a NodePort Service. However, the IP address of the NodePort Service is translated into the IP address of the gateway interface of the accessed container by SNAT, which may generate the logs you see above.
Impact
The normal running of the accessed container is not affected. When a container is destroyed, there is a slight impact that packets are repeatedly sent for 10 times and then discarded. This process is fast in the kernel and has little impact on the performance.
If the ARP ages, the service does not retry, or a new container is started, the container service packets are redirected to the new service through kube-proxy.
Handling in the Open-Source Community
Currently, this problem still exists in the community when the IPvlan L2E mode is used. The problem has been reported to the community for a better solution.
Solution
The dead loop problem does not need to be resolved.
However, it is recommended that the service pod gracefully exit. Before the service is terminated, set your pod to the deleting state. After the service processing is complete, the pod exits.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot