Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Ubiquitous Cloud Native Service/ FAQs/ On-Premises Clusters/ What Can I Do If an On-Premises Cluster Fails to Be Connected?

What Can I Do If an On-Premises Cluster Fails to Be Connected?

Updated on 2024-12-31 GMT+08:00

Symptom

This section describes how to troubleshoot cluster connection exceptions and provides solutions. The following exceptions may occur when a cluster is connected to UCS:

  • You have registered a cluster to UCS and deployed proxy-agent in the cluster, but the console always displays an error message, indicating that the cluster is waiting for connection or fails to get registered after the connection times out.
    NOTE:

    If the cluster registration fails, click in the upper right corner to register it again and locate the fault as guided in Troubleshooting.

  • If the status of a connected cluster is unavailable, rectify the fault by referring to Troubleshooting in this section.

Troubleshooting

Table 1 explains the error messages for you to locate faults.

Table 1 Error message description

Error Message

Description

Check Item

"currently no agents available, please make sure the agents are correctly registered"

The proxy-agent in the connected cluster is abnormal or the network is abnormal.

"please check the health status of kube apiserver: ..."

The kube-apiserver in the cluster cannot be accessed.

"cluster responded with non-successful status code: ..."

Rectify the fault based on the returned status code.

For example, status code 401 indicates that the user does not have the access permission. A possible cause is that the cluster authentication information has expired.

"cluster responded with non-successful message: ..."

Rectify the fault based on the returned information.

For example, the message Get "https://172.16.0.143:6443/readyz?timeout=32s\": context deadline exceeded indicates that the access to the API server times out. A possible cause is that the API server is faulty.

-

"Current cluster version is not supported in UCS service."

This error occurs because the cluster version does not meet requirements. The version of the Kubernetes cluster connected to UCS must be 1.19 or later.

-

Check Item 1: proxy-agent

NOTICE:

After the cluster is unregistered from UCS, the authentication information contained in the original proxy-agent configuration file becomes invalid. You need to delete the proxy-agent pods deployed in the cluster. To connect the cluster to UCS again, download the proxy-agent configuration file from the UCS console again and use it for re-deployment.

  1. Log in to the master node of the destination cluster.
  2. Check the deployment of the cluster agent.

    kubectl -n kube-system get pod | grep proxy-agent

    Expected output for successful deployment:

    proxy-agent-*** 1/1 Running 0 9s

    If proxy-agent is not in the Running state, run the kubectl -n kube-system describe pod proxy-agent-*** command to view the pod alarms. For details, see What Can I Do If proxy-agent Fails to Be Deployed?.

    NOTE:

    By default, proxy-agent is deployed with two pods, and can provide services as long as one pod is running properly. However, one pod cannot ensure high availability.

  3. Print the pod logs of proxy-agent and check whether the agent program can connect to UCS.

    kubectl -n kube-system logs proxy-agent-*** | grep "Start serving"

    If no "Start serving" log is printed but the proxy-agent pods are working, check other check items.

Check Item 2: Network Connection Between the Cluster and UCS

Public network access

  1. Check whether a public IP is bound to the cluster or a public NAT gateway is configured.
  2. Check whether the outbound traffic of the cluster security group is allowed. To perform access control on the outbound traffic, contact technical support to obtain the destination IP and port number.
  3. After rectifying network faults, delete the existing proxy-agent pods to rebuild pods. Check whether the logs of the new pods contain "Start serving".

    kubectl -n kube-system logs proxy-agent-*** | grep "Start serving"

  4. If desired logs are printed, refresh the UCS console page and check whether the cluster is properly connected.

Private network access

  1. Check whether the outbound traffic of the cluster security group is allowed. To perform access control on the outbound traffic, contact technical support to obtain the destination IP and port number.
  2. Rectify the network connection faults between the cluster and UCS or IDC.

    Refer to the following guides according to your network connection type:

  3. Rectify the VPC endpoint fault. The VPC endpoint status must be Accepted. If the VPC endpoint is deleted by mistake, create one again. For details, see How Do I Restore a Deleted VPC Endpoint for a Cluster Connected Over a Private Network?.

    Figure 1 Checking the VPC endpoint status

  4. After rectifying network faults, delete the existing proxy-agent pods to rebuild pods. Check whether the logs of the new pods contain "Start serving".

    kubectl -n kube-system logs proxy-agent-*** | grep "Start serving"

  5. If desired logs are printed, refresh the UCS console page and check whether the cluster is properly connected.

Check Item 3: kube-apiserver

When connecting a cluster to UCS, the error message shown in Figure 2 may be displayed, saying "please check the health status of kube apiserver: ...".

Figure 2 Abnormal kube-apiserver

This indicates that proxy-agent cannot communicate with the API server in the cluster. Users may have different network configurations for the cluster to connect to UCS. Therefore, UCS does not provide any unified solution for this fault. You need to rectify it on your own and try again.

  1. Log in to the UCS console. In the navigation pane, choose Fleets.
  2. Log in to the master node of the destination cluster and check whether the proxy-agent pods can access the apiserver of the destination cluster.

    Example command:

    kubectl exec -ti proxy-agent-*** -n kube-system /bin/bash
    # Access kube-apiserver of the cluster.
    curl -kv https://kubernetes.default.svc.cluster.local/readyz

    If the access fails, rectify the cluster network fault, register the cluster to UCS again, and re-deploy proxy-agent.

Check Item 4: Cluster Authentication Information Changes

If the error message "cluster responded with non-successful status: [401][Unauthorized]" is displayed, the IAM network connection may be faulty, according to the /var/paas/sys/log/kubernetes/auth-server.log of the three master nodes in the cluster. Ensure that the IAM domain name resolution and the IAM service connectivity are normal.

The common issue logs are as follows:

  • Failed to authenticate token: *******: dial tcp: lookup iam.myhuaweicloud.com on *.*.*.*:53: no such host

    This log indicates that the node is not capable of resolving iam.myhuaweicloud.com. Configure the corresponding domain name resolution by referring to Preparing for Installation.

  • Failed to authenticate token: Get *******: dial tcp *.*.*.*:443: i/o timeout

    This log indicates that the node's access to IAM times out. Ensure that the node can communicate with Huawei Cloud IAM properly.

  • currently only supports Agency token

    This log indicates that the request is not initiated by UCS. Currently, on-premises clusters can only be connected to UCS using IAM tokens.

  • IAM assumed user has no authorization/iam assumed user should allowed by TEAdmin

    This log indicates that the connection between UCS and the cluster is abnormal. Contact Huawei technical support.

  • Failed to authenticate token: token expired, please acquire a new token

    This log indicates that the token has expired. Run the date command to check whether the time difference is too large. If yes, synchronize the time and check whether the cluster is working. If the fault persists for a long time, you may need to reinstall the cluster. In this case, contact Huawei technical support.

After the preceding problem is resolved, run the crictl ps | grep auth | awk '{print $1}' | xargs crictl stop command to restart the auth-server container.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback