How Do I Locate the Fault When a Cluster Is Unavailable?
If a cluster is unavailable, perform the following operations to locate the fault.
Troubleshooting Process
The issues here are described in order of how likely they are to occur.
Check these causes one by one until you find the cause of the fault.
- Check Item 1: Whether the Security Group Is Modified
- Check Item 2: Whether There Are Residual Listeners and Backend Server Groups on the Load Balancer
- Check Item 3: Whether the KMS Key Used for Secret Encryption Is Valid
If the fault persists, submit a service ticket and contact the customer service to help you locate the fault.
Check Item 1: Whether the Security Group Is Modified
- Log in to the management console and choose Service List > Networking > Virtual Private Cloud. In the navigation pane, choose Access Control > Security Groups to find the security group of the master node in the cluster.
The name of this security group is in the format of Cluster name-cce-control-ID.
- Click the security group. On the details page displayed, ensure that the security group rules of the master node are correct.
For details, see How Can I Configure a Security Group Rule in a Cluster?
Check Item 2: Whether There Are Residual Listeners and Backend Server Groups on the Load Balancer
Reproducing the Problem
A cluster exception occurs when a LoadBalancer Service is being created or deleted. After the fault is rectified, the Service is deleted successfully, but there are residual listeners and backend server groups.
- Pre-create a CCE cluster. In the cluster, use the official Nginx image to create a workload, preset a load balancer, a Service, and an ingress.
- Ensure that the cluster is running properly and the Nginx workload is stable.
- Create and delete 10 LoadBalancer Services every 20 seconds.
- Verify that an injection exception occurs in the cluster. For example, the etcd is unavailable or the cluster is hibernated.
Possible Causes
There are residual listeners and backend server groups on the load balancer.
Solution
Manually clear residual listeners and backend server groups.
- Log in to the management console and choose Networking > Elastic Load Balance from the service list.
- In the load balancer list, click the name of the target load balancer to go to the details page. On the Listeners tab, locate the target listener and delete it.
- On the Backend Server Groups page, locate the target backend server group and delete it.
Check Item 3: Whether the KMS Key Used for Secret Encryption Is Valid
If a cluster is unavailable, you can check the cluster event to locate the fault.
If KMS key status abnormal is displayed in the events, check whether the key used by the cluster is in the Disabled or Pending deletion state.
Solution:
- Log in to the DEW console.
- In the custom key list, find the KMS key used by the cluster.
- For a key in the Pending deletion state, click Cancel Deletion in the Operation column. If the key remains in a Disabled state even after cancellation, then cancel the action of disabling the key.
- For a key in the Disabled state, click Enable in the Operation column.
- Verify whether the key has been enabled and wait for the cluster to be automatically restored. The restoration process should take about 5 to 10 minutes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot