Help Center/ Ubiquitous Cloud Native Service/ FAQs/ Permissions/ What Can I Do If I Cannot Associate the Permission Policy with a Fleet or Cluster?
Updated on 2024-09-11 GMT+08:00

What Can I Do If I Cannot Associate the Permission Policy with a Fleet or Cluster?

Symptom

When associating the permission policy with a fleet or a cluster not in the fleet, the association may fail due to cluster connection exceptions. In this case, detailed exception events will be displayed on the Set Permissions page of the fleet or cluster. Check and rectify the fault in the cluster, and then click Retry to associate the permission policy again.

Troubleshooting

If an exception occurs when the permission policy is being associated with a fleet or cluster, locate the fault based on the error message, as shown in Table 1.

Table 1 Error message description

Error Message

Description

Check Item

ClusterRole failed reason:Get \"https://kubernetes.default.svc.cluster.local/apis/rbac.authorization.k8s.io/v1/clusterroles/XXXXXXX?timeout=30s\": Precondition Required"

Or

Get ClusterRole failed reason:an error on the server (\"unknown\") has prevented the request from succeeding (get clusterroles.rbac.authorization.k8s.io

The cluster has not been connected, proxy-agent in the connected cluster is abnormal, or the network is abnormal.

Unauthorized

Rectify the fault based on the returned status code.

For example, status code 401 indicates that the user does not have the access permission. A possible cause is that the cluster authentication information has expired.

Get cluster namesapce[x] failed.

Or

Reason:namespace "x" not found.

There is no corresponding namespace in the cluster.

Create a namespace in the cluster and try again.

Example: kubectl create namespace ns_name

If the namespace is not required, ignore this exception event.

Check Item 1: proxy-agent

After the cluster is unregistered from UCS, the authentication information contained in the original proxy-agent configuration file becomes invalid. You need to delete the proxy-agent pods deployed in the cluster. To connect the cluster to UCS again, download the proxy-agent configuration file from the UCS console again and use it for re-deployment.

  1. Log in to the master node of the destination cluster.
  2. Check the deployment of the cluster agent.

    kubectl -n kube-system get pod | grep proxy-agent

    Expected output for successful deployment:

    proxy-agent-*** 1/1 Running 0 9s

    If proxy-agent is not in the Running state, run the kubectl -n kube-system describe pod proxy-agent-*** command to view the pod alarms. For details, see What Can I Do If proxy-agent Fails to Be Deployed?.

    By default, proxy-agent is deployed with two pods, and can provide services as long as one pod is running properly. However, one pod cannot ensure high availability.

  3. Print the pod logs of proxy-agent and check whether the agent program can connect to UCS.

    kubectl -n kube-system logs proxy-agent-*** | grep "Start serving"

    If no "Start serving" log is printed but the proxy-agent pods are working, check other check items.

Check Item 2: Network Connection Between the Cluster and UCS

For clusters connected over a public network:

  1. Check whether a public IP is bound to the cluster or a public NAT gateway is configured.
  2. Check whether the outbound traffic of the cluster security group is allowed. To perform access control on the outbound traffic, contact technical support to obtain the destination IP and port number.
  3. After rectifying network faults, delete the existing proxy-agent pods to rebuild pods. Check whether the logs of the new pods contain "Start serving".

    kubectl -n kube-system logs proxy-agent-*** | grep "Start serving"

  4. If desired logs are printed, refresh the UCS console page and check whether the cluster is properly connected.

For clusters connected over a private network:

  1. Check whether the outbound traffic of the cluster security group is allowed. To perform access control on the outbound traffic, contact technical support to obtain the destination IP and port number.
  2. Rectify the network connection faults between the cluster and UCS or IDC.

    Refer to the following guides according to your network connection type:

  3. Rectify the VPC endpoint fault. The VPC endpoint status must be Accepted. If the VPC endpoint is deleted by mistake, create one again. For details, see How Do I Restore a Deleted VPC Endpoint for a Cluster Connected Over a Private Network?.

    Figure 1 Checking the VPC endpoint status

  4. After rectifying network faults, delete the existing proxy-agent pods to rebuild pods. Check whether the logs of the new pods contain "Start serving".

    kubectl -n kube-system logs proxy-agent-*** | grep "Start serving"

  5. If desired logs are printed, refresh the UCS console page and check whether the cluster is properly connected.

Check Item 3: Cluster Authentication Information Changes

If the error message "cluster responded with non-successful status: [401][Unauthorized]" is displayed, the IAM network connection may be faulty, according to the /var/paas/sys/log/kubernetes/auth-server.log of the three master nodes in the cluster. Ensure that the IAM domain name resolution and the IAM service connectivity are normal.

The common issue logs are as follows:

  • Failed to authenticate token: *******: dial tcp: lookup iam.myhuaweicloud.com on *.*.*.*:53: no such host

    This log indicates that the node is not capable of resolving iam.myhuaweicloud.com. Configure the corresponding domain name resolution by referring to Preparing for Installation.

  • Failed to authenticate token: Get *******: dial tcp *.*.*.*:443: i/o timeout

    This log indicates that the node's access to IAM times out. Ensure that the node can communicate with IAM properly.

  • currently only supports Agency token

    This log indicates that the request is not initiated by UCS. Currently, on-premises clusters can only be connected to UCS using IAM tokens.

  • IAM assumed user has no authorization/iam assumed user should allowed by TEAdmin

    This log indicates that the connection between UCS and the cluster is abnormal. Contact Huawei technical support for troubleshooting.

  • Failed to authenticate token: token expired, please acquire a new token

    This log indicates that the token has expired. Run the date command to check whether the time difference is too large. If yes, synchronize the time and check whether the cluster is working. If the fault persists for a long time, you may need to reinstall the cluster. In this case, contact Huawei technical support.

After the preceding problem is resolved, run the crictl ps | grep auth | awk '{print $1}' | xargs crictl stop command to restart the auth-server container.