Help Center/ Cloud Container Engine/ User Guide (Kuala Lumpur Region)/ FAQs/ Cluster/ Cluster Running/ How Do I Locate the Fault When a Cluster Is Unavailable?
Updated on 2024-10-14 GMT+08:00

How Do I Locate the Fault When a Cluster Is Unavailable?

If a cluster is Unavailable, perform the following operations to locate the fault.

Troubleshooting Process

The issues here are described in order of how likely they are to occur.

Check these causes one by one until you find the cause of the fault.

If the fault persists, contact the customer service to help you locate the fault.

Figure 1 Fault locating

Check Item 1: Whether the Security Group Is Modified

  1. Log in to the management console, and choose Service List > Networking > Virtual Private Cloud. In the navigation pane on the left, choose Access Control > Security Groups to find the security group of the master node in the cluster.

    The name of this security group is in the format of Cluster name-cce-control-ID.

  2. Click the security group. On the details page displayed, ensure that the security group rules of the master node are correct.

    For details, see How Can I Configure a Security Group Rule in a Cluster?

Check Item 2: Whether There Are Residual Listeners and Backend Server Groups on the Load Balancer

Reproducing the Problem

A cluster exception occurs when a LoadBalancer Service is being created or deleted. After the fault is rectified, the Service is deleted successfully, but there are residual listeners and backend server group.

  1. Pre-create a CCE cluster. In the cluster, use the official Nginx image to create workloads, preset load balancers, Services, and ingresses.
  2. Ensure that the cluster is running properly and the Nginx workload is stable.
  3. Create and delete 10 LoadBalancer Services every 20 seconds.
  4. An injection exception occurs in the cluster. For example, the etcd pod is unavailable or the cluster is hibernated.

Possible Causes

There are residual listeners and backend server groups on the load balancer.

Solution

Manually clear residual listeners and backend server groups.

  1. Log in to the management console and choose Network > Elastic Load Balance from the service list.
  2. In the load balancer list, click the name of the target load balancer to go to the details page. On the Listeners tab page, locate the target listener and delete it.
  3. On the Backend Server Groups tab page, locate the target backend server group and delete it.