Why Does a Panic Occasionally Occur When I Use Network Policies on a Cluster Node?

Scenario

Cluster version: v1.15.6-r1

Cluster type: CCE cluster

Network model: Container tunnel network

Node operating system: CentOS 7.6

After a network policy is configured for the cluster, the canal-agent network component on the node is incompatible with the CentOS 7.6 kernel. As a result, a kernel panic may occur.

Conditions

If any of the following conditions is not met, this issue will not occur:

The cluster version is v1.15.6-r1 and the container tunnel network model is used.
The CentOS 7.6 node uses the canal-agent component whose version is 1.0.RC10.1230.B005 or earlier. (CentOS 7.6 nodes created on or before February 23, 2021 use such component.)
You plan to use or have used network policies.

Fault Locating

Quick locating (for pay-per-use nodes)

Check whether your CentOS 7.6 node was created after February 24, 2021 on the CCE console.

Accurate locating (General)

If the cluster version is v1.15.6-r1, the network model is container tunnel network, the node OS is CentOS 7.6, and the canal-agent component version is 1.0.RC10.1230.B005.sp1 or later, the problem will not occur. If an earlier version is used (for example, 1.0.RC10.1230.B002), you are advised to reset or delete the node before configuring network policies.

Perform the following steps to query the version of the network component on the node:

Prepare a node where kubectl can be used.

Run the following command to query the CentOS node list:

for node_item in $(kubectl get nodes --no-headers | awk '{print $1}') ; do kubectl get node ${node_item} -o yaml | grep CentOS >/dev/null; if [[ "$?" == "0" ]];then echo "${node_item} is CentOS node";fi;done

The command output is as follows:

Assume that the IP address of the target CentOS node is 10.0.50.187. Run the following command to check the canal-agent version:
```
kubectl get packageversions.version.cce.io 10.0.50.187 -o yaml | grep -A 1 canal-agent
```
The command output is as follows:

Solution

If you still want to use the node, reset the CentOS 7.6 nodes in the cluster to upgrade the networking components to the latest version. For details, see Resetting a Node.

If you want to delete the risky node and purchase a new one, see Deleting a Node and Buying a Node.

Parent Topic: Network Exception Troubleshooting