Help Center/ Host Security Service/ FAQs/ Container Security/ What Do I Do If the Cluster Connection Component (ANP-Agent) Failed to Be Deployed?
Updated on 2024-11-15 GMT+08:00

What Do I Do If the Cluster Connection Component (ANP-Agent) Failed to Be Deployed?

Cluster Connection Component (ANP-Agent) Installation Failure

Symptom

During the access to a third-party cloud cluster or on-premises cluster, the following command is executed to check the installation status of the cluster connection component (ANP-agent):
kubectl get pods -n hss | grep proxy-agent
The following information is displayed, indicating the cluster connection component (ANP-agent) failed to be installed.
proxy-agent-5dc5cf6cd7-khdlt   0/1     ImagePullBackOff     0          42h 
proxy-agent-5dc5cf6cd7-n56bx   0/1     Pending              0          42h

Solution

  1. Log in to a node in the cluster.
  2. Run the following command to view the node information:

    kubectl describe pod proxy-agent-xxx -n hss

    proxy-agent-xxx is the name of the cluster connection component displayed in the command output in "Symptom", for example, proxy-agent-5dc5cf6cd7-khdlt.

  3. Identify the cause based on the command output.

    • Possible cause: The image of the cluster connection component cannot be pulled.
      Figure 1 Failed to pull the image of the cluster connection component

      Solution: If your access mode is set to Non-CCE cluster (Internet access), ensure your cluster can access the Internet (that is, SWR images can be pulled). If your cluster cannot access the Internet, set the access mode to Non-CCE cluster (private network access). For details, see Connecting a Non-CCE Cluster to the HSS (Private Network).

    • Possible cause: There are not enough CPUs or memory on the node. Insufficient cpu/memory is displayed.
      Figure 2 Insufficient CPU or memory

      Solution: Scale up the node and retry access.

    • Possible cause: There are no nodes matching the scheduling rule.
      Figure 3 No nodes matching the scheduling rule

      Solution: For high availability purposes, the cluster connection component (ANP-agent) allocates two instances to different nodes by default. Ensure there are at least two available nodes in the cluster.

Cluster Connection Component (ANP-Agent) Connection Failure

Symptom

During the access to a third-party cloud cluster or on-premises cluster, the following command is executed to check the connection status of the cluster connection component (ANP-agent):
for a in $(kubectl get pods -n hss| grep proxy-agent | cut -d ' ' -f1); do kubectl -n hss logs $a | grep 'Start serving';done

The command output is empty, indicating the cluster failed to connect to HSS.

Solution

  1. Log in to a node in the cluster.
  2. Run the following command to check the node logs:

    kubectl logs proxy-agent-xxx -n hss

  3. If the command output shown in Figure 4 is displayed, the grpc connection between the cluster connection component and the HSS server failed to be established.

    Figure 4 Connection failed

  4. Perform the following steps to locate and rectify the fault:

    Format of the server domain name of the cluster connection component: hss-anp.region_codemyhuaweicloud.com

    For details about region codes, see Regions and Endpoints.

    1. Check whether the cluster security group allows outbound access to port 8091 of the 100.125.0.0/16 CIDR block.
      • If the access is allowed, go to 4.b.
      • If the access is denied, configure the security group to allow outbound access to the port and retry access.
    2. Run the following command to check whether the server domain name of the cluster connection component can be pinged:
      ping {{Server_domain_name_of_cluster_connection_component}}
      • If it can be pinged, go to 4.c.
      • If the IP address cannot be pinged, set the DNS server address to the private DNS server address of Huawei Cloud. For more information, see Private DNS Server Address of Huawei Cloud. After the configuration is complete, connect to the cluster asset again.
    3. Run the following command to check whether the specified port of the cluster connection component can be accessed:
      telnet {{Server_domain_name_of_cluster_connection_component}} 8091
      • If the access is allowed, go to 4.d.
      • If the access fails, disable the firewall and try again.
    4. In the upper right corner of the Huawei Cloud console, choose Service Tickets > Create Service Ticket and submit a service ticket.