Troubleshooting

Symptom 1: Elastic scheduling to CCI is unavailable.

Cause: The subnet where the CCE cluster resides overlaps with 10.247.0.0/16, which is the CIDR block reserved for the Service in the CCI namespace.

Solution: Reset a subnet for the CCE cluster.

Symptom 2: After the CCE Cloud Bursting Engine for CCI add-on is rolled back from 1.5.18 or later to a version earlier than 1.5.18, pods cannot be accessed through the Service.

Cause: Once the add-on is upgraded to 1.5.18 or later, the sidecar in each pod that is newly scaled to CCI is incompatible with the add-on of a version earlier than 1.5.18. So, after the add-on is rolled back, the access to the pods is abnormal. If the add-on version is earlier than 1.5.18, pods scaled to CCI are not affected.

Solutions:

Solution 1: Upgrade the add-on to 1.5.18 or later again.
Solution 2: Delete the pods that failed to be accessed through the Service and create pods. The new pods scaled to CCI can be accessed normally.

Symptom 3: The add-on cannot be uninstalled.

Scenario: The add-on fails to be uninstalled due to incorrect modification of swr_addr and swr_user.

Click to enlarge

Cause: The add-on uninstallation depends on gc-jobs. If the image fails to be pulled, gc-jobs cannot be executed successfully. As a result, the uninstallation fails.

Solution: Uninstall the add-on again and then delete gc-jobs in sequence.

If the add-on fails to be uninstalled, log in to the node where kubectl is configured in the CCE cluster and click Uninstall.

Run the following commands within 210 seconds.

Delete resource-gc-jobs.

kubectl get job -nkube-system | grep "virtual-kubelet-.*-resource-gc-jobs"
kubectl delete job -nkube-system xxx

Click to enlarge

Delete namespace-gc-jobs.

kubectl get job -nkube-system | grep "virtual-kubelet-.*-namespace-gc-jobs"
kubectl delete job -nkube-system yyy

Click to enlarge

For other exceptions, submit a service ticket.

Symptom 4: Service containers that can be accessed through a Service fail to start.

Scenario: Service containers fail to start at the first time because they require a Service for access or PostStart relies on the Service. After the sidecar containers start, the service containers successfully restart.

Cause: The pods on CCI depend on the sidecar containers to access the Service. If the Service synchronization is not complete, the service containers fail to access the Service. After the synchronization is complete, the service containers can start normally.

Solution: Upgrade the add-on to 1.5.28 or later.

Parent Topic: Using CCI with CCE

Previous topic: Virtual Node Configurations

Next topic: O&M Management