How Do I Locate the Fault When a Cluster Is Unavailable?

This section provides you with some operations to locate the fault when a cluster becomes unavailable.

Fault Locating

Possible causes are described here in order of how likely they are to occur.

If the fault persists after you have ruled out a cause, check other causes.

Check Item 1: Whether the Security Group Is Modified
Check Item 2: Whether the Cluster Is Overloaded

If the fault persists, contact the customer service to help you locate the fault.

Check Item 1: Whether the Security Group Is Modified

Log in to the management console and choose Service List > Networking > Virtual Private Cloud. In the navigation pane, choose Access Control > Security Groups and find the security group of the master node in the cluster.

The name of this security group is in the format of Cluster name-cce-control-ID.
Click the security group. On the details page displayed, ensure that the security group rules of the master node are correct.

For details, see How Can I Configure a Security Group Rule for a Cluster?

Check Item 2: Whether the Cluster Is Overloaded

Symptom

The resource usage on the master nodes in the cluster reaches 100%.

Possible Cause

When a cluster has a large number of resources created simultaneously, it causes an overload on the API server. This, in turn, overloads the master nodes and leads to OOM issues.

Solution

Increase the cluster management scale. A larger cluster management scale means higher capacity and improved performance of the master nodes. For details, see Changing Cluster Scale.

If a cluster is overloaded, you can submit a service ticket for technical support.

Parent topic: Cluster Running

Previous topic: Cluster Running

Next topic: How Do I Retrieve Data After a CCE Cluster Is Deleted?