Why Does a Panic Occasionally Occur When I Use Network Policies on a Cluster Node?
Scenario
Cluster version: v1.15.6-r1
Cluster type: CCE cluster
Network model: Container tunnel network
Node operating system: CentOS 7.6
After a network policy is configured for the cluster, the canal-agent network component on the node is incompatible with the CentOS 7.6 kernel. As a result, a kernel panic may occur.
Conditions
If any of the following conditions is not met, this issue will not occur:
- The cluster version is v1.15.6-r1 and the container tunnel network model is used.
- The CentOS 7.6 node uses the canal-agent component whose version is 1.0.RC10.1230.B005 or earlier. (CentOS 7.6 nodes created on or before February 23, 2021 use such component.)
- You plan to use or have used network policies.
Fault Locating
Quick locating (for pay-per-use nodes)
Check whether your CentOS 7.6 node was created after February 24, 2021 on the CCE console.
Accurate locating (General)
If the cluster version is v1.15.6-r1, the network model is container tunnel network, the node OS is CentOS 7.6, and the canal-agent component version is 1.0.RC10.1230.B005.sp1 or later, the problem will not occur. If an earlier version is used (for example, 1.0.RC10.1230.B002), you are advised to reset or delete the node before configuring network policies.
Perform the following steps to query the version of the network component on the node:
- Prepare a node where kubectl can be used.
- Run the following command to query the CentOS node list:
for node_item in $(kubectl get nodes --no-headers | awk '{print $1}') ; do kubectl get node ${node_item} -o yaml | grep CentOS >/dev/null; if [[ "$?" == "0" ]];then echo "${node_item} is CentOS node";fi;done
The command output is as follows:
- Assume that the IP address of the target CentOS node is 10.0.50.187. Run the following command to check the canal-agent version:
kubectl get packageversions.version.cce.io 10.0.50.187 -o yaml | grep -A 1 canal-agent
The command output is as follows:
Solution
If you still want to use the node, reset the CentOS 7.6 nodes in the cluster to upgrade the networking components to the latest version. For details, see Resetting a Node.
If you want to delete the risky node and purchase a new one, see Deleting a Node and Buying a Node.
Network Fault FAQs
- How Do I Locate a Workload Networking Fault?
- Why the ELB Address Cannot Be used to Access Workloads in a Cluster?
- Why the Ingress Cannot Be Accessed Outside the Cluster?
- Why Does the Browser Return Error Code 404 When I Access a Deployed Application?
- What Should I Do If a Container Fails to Access the Internet?
- What Can I Do If a VPC Subnet Cannot Be Deleted?
- How Do I Restore a Faulty Container NIC?
- What Should I Do If a Node Fails to Connect to the Internet (Public Network)?
- How Do I Resolve a Conflict Between the VPC CIDR Block and the Container CIDR Block?
- What Should I Do If the Java Error "Connection reset by peer" Is Reported During Layer-4 ELB Health Check
- How Do I Locate the Service Event Indicating That No Node Is Available for Binding?
- Why Does "Dead loop on virtual device gw_11cbf51a, fix it urgently" Intermittently Occur When I Log In to a VM using VNC?
- Why Does a Panic Occasionally Occur When I Use Network Policies on a Cluster Node?
- Why Are Lots of source ip_type Logs Generated on the VNC?
- What Should I Do If Status Code 308 Is Displayed When the Nginx Ingress Controller Is Accessed Using the Internet Explorer?
- What Should I Do If an Nginx Ingress Access in the Cluster Is Abnormal After the Add-on Is Upgraded?
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbotmore