Help Center> Ubiquitous Cloud Native Service> FAQs> Container Clusters> Why Does a Cluster Stay in Waiting or Abnormal State When Connecting to UCS?
Updated on 2022-11-07 GMT+08:00

Why Does a Cluster Stay in Waiting or Abnormal State When Connecting to UCS?

Background

This section guides you to troubleshoot the exceptions you may encounter when connecting a cluster to UCS:

  • You have connected a cluster to UCS and deployed proxy-agent in the cluster, but the console always displays an error message, indicating that the cluster is waiting for connection or fails to get registered after the connection times out.

    If the cluster fails to register, click in the upper right corner to register it again and locate the fault as guided in Troubleshooting.

  • If a connected cluster is unavailable:
    • For Huawei Cloud CCE clusters: Go to the CCE console to check the cluster status. If the cluster is Unavailable, rectify the fault by referring to FAQ documentation.
    • For self-built or third-party clusters: Refer to Troubleshooting.

Troubleshooting

Table 1 explains the error messages for you to locate faults.

Table 1 Error message description

Error Message

Description

Check Item

"currently no agents available, please make sure the agents are correctly registered"

The proxy-agent in the connected cluster is abnormal or the network is abnormal.

"please check the health status of kube apiserver: ..."

The kube-apiserver in the cluster cannot be accessed.

"cluster responded with non-successful status code: ..."

Rectify the fault based on the returned status code.

For example, status code 401 indicates that the user does not have the access permission. A possible cause is that the cluster authentication information has expired.

"cluster responded with non-successful message: ..."

Rectify the fault based on the returned information.

For example, the message "Get "https://172.16.0.143:6443/readyz?timeout=32s\": context deadline exceeded" indicates that the access to the API server times out. A possible cause is that the API server is faulty.

-

Check Item 1: proxy-agent

After the cluster is removed from UCS, the authentication information contained in the original proxy-agent configuration file becomes invalid. You need to delete the proxy-agent pods deployed in the cluster. To connect the cluster to UCS again, download the proxy-agent configuration file from the UCS console again and use it for re-deployment.

  1. Log in to the master node of the target cluster.
  2. Check the deployment of the cluster agent.kubectl -n kube-system get pod | grep proxy-agent

    Expected output for successful deployment:

    proxy-agent-*** 1/1 Running 0 9s

    If proxy-agent is not in the Running state, run the kubectl -n kube-system describe pod proxy-agent-*** command to view the pod alarms. For details, see Why Does proxy-agent Fail to Run?.

    By default, proxy-agent is deployed with two pods, and can provide services as long as one pod is running properly. However, one pod cannot ensure high availability.

  3. Print the pod logs of proxy-agent and check whether the agent program can connect to UCS.kubectl -n kube-system logs proxy-agent-*** | grep "Start serving"

    If no "Start serving" log is printed but the proxy-agent pods are in normal state, check other check items.

Check Item 2: Network Connection Between the Cluster and UCS

For clusters connected through a public network:

  1. Check whether a public IP is bound to the cluster or a public NAT gateway is configured.
  2. Check whether the outbound traffic of the cluster security group is allowed. To perform access control on the outbound traffic, contact technical support to obtain the destination IP and port number.
  3. After rectifying network faults, delete the existing proxy-agent pods to rebuild pods. Check whether the logs of the new pods contain "Start serving".kubectl -n kube-system logs proxy-agent-*** | grep "Start serving"
  4. If desired logs are printed, refresh the UCS console page and check whether the cluster is properly connected.

For clusters connected through a private network:

  1. Check whether the outbound traffic of the cluster security group is allowed. To perform access control on the outbound traffic, contact technical support to obtain the destination IP and port number.
  2. Rectify the network connection faults between the cluster and UCS, IDC, or third-party clouds.Refer to the following guides according to your network connection type.

  3. Rectify the VPC Endpoint (VPCEP) faults. The VPCEP status must be Accepted. If the VPCEP is deleted by mistake, create it again. For details, see How Do I Restore a Deleted VPC Endpoint for a Cluster Connected Through a Private Network?.

    Figure 1 Checking VPCEP status

  4. After rectifying network faults, delete the existing proxy-agent pods to rebuild pods. Check whether the logs of the new pods contain "Start serving".kubectl -n kube-system logs proxy-agent-*** | grep "Start serving"
  5. If desired logs are printed, refresh the UCS console page and check whether the cluster is properly connected.

Check Item 3: kube-apiserver

When connecting a cluster to UCS, the error message shown in Figure 2 may be displayed, saying "please check the health status of kube apiserver: ...".

Figure 2 Abnormal kube-apiserver

This indicates that proxy-agent cannot communicate with the API server in the cluster. Users may have different network configurations for the cluster to connect to UCS. Therefore, UCS does not provide any unified solution for this fault. You need to rectify it on your own and try again.

  1. Log in to the UCS console. In the navigation pane, choose Container Clusters.
  2. Log in to the master node of the cluster and check the API server address.kubectl get po `kubectl get po -nkube-system | grep kube-apiserver | awk {'print $1'}` -nkube-system -oyaml | grep advertise-address.endpoint
  3. Check whether the clusters.cluster.server field in the kubeconfig file of the cluster is the same as the API server address of the cluster queried in 2.

    If not, the cluster provider may have converted the API server address. You need to replace the API server address in the kubeconfig file, re-connect the cluster to UCS, and deploy proxy-agent again.

    If the value of clusters.cluster.server in the kubeconfig file is https://kubernetes.default.svc.cluster.local:443, you can retain it, which is the local domain name of the Kubernetes Service (ClusterIP of the API server).

  4. Check whether the proxy-agent pod can access the API server of the cluster to be connected.Example command:

    kubectl exec -ti proxy-agent-*** -n kube-system /bin/bash
    # Access kube-apiserver of the cluster.
    curl -kv https://*.*.*.*:*/readyz

    If the access fails, rectify the cluster network fault, re-connect the cluster to UCS, and deploy proxy-agent again.

Check Item 4: Cluster Authentication Information Changes

If "cluster responded with non-successful status: [401][Unauthorized]" is displayed, the cluster authentication information may have expired or changed. As a result, UCS cannot access kube-apiserver. You need to remove the cluster, use a new kubeconfig file to register the cluster again, and re-deploy proxy-agent.

  • A permanent kubeconfig file can prevent such faults.
  • The authentication information will change after you renew a third-party cluster provided by certain vendors. Pay attention to these vendors and try avoiding cluster arrears.

Container Clusters FAQs

more