Configuring Conditional Automatic Traffic Switchover
This section describes how to configure conditional automatic traffic switchover to identify CoreDNS faults in a cluster and automatically redirect traffic.
Installing CPD for a Cluster to Identify Faults
Before configuring automatic traffic switchover, you need to install cluster-problem-detector (CPD) in a cluster to automatically detect whether CoreDNS runs normally and report the results.
CPD periodically checks whether CoreDNS can resolve kubernetes.default and updates the result to conditions of the node object. The active CPD pod collects conditions on each node, determines whether cluster domain name resolution is normal, and reports the result to the federation control plane of the cluster.
CPD needs to be independently deployed as a DaemonSet on all nodes in each cluster. The following is an example CPD configuration file. You can modify the parameters by referring to Table 1.
Parameter |
Description |
---|---|
<federation-version> |
Version of the federation that the cluster belongs to. On the Fleets tab, click the fleet name to obtain the version. |
<your-cluster-name> |
Name of the cluster where CPD is to be installed. |
<kubeconfig-of-karmada> |
The kubeconfig file of the federation control plane. For details about how to download the kubeconfig file that meets the requirements, see kubeconfig.
CAUTION:
|
hostAliases |
If the IP address of the federation control plane in the kubeconfig file is set to a domain name, you need to configure hostAliases in the YAML file. If the IP address is not a domain name, delete hostAliases from the YAML file.
|
coredns-detect-period |
Interval for CoreDNS to detect and report data, which defaults to 5s (recommended value). A smaller value indicates more frequent data detection and reporting. |
coredns-success-threshold |
Threshold of the duration in which CoreDNS successfully resolves a domain name, which defaults to 30s (recommended value). If the duration exceeds this threshold, CoreDNS is normal. A higher value indicates more stable detection but lower sensitivity, while a lower value indicates less stable detection but higher sensitivity. |
coredns-failure-threshold |
Threshold of the duration in which CoreDNS fails to resolve a domain name, which defaults to 30s (recommended value). If the duration exceeds this threshold, CoreDNS is faulty. A higher value indicates more stable detection but lower sensitivity, while a lower value indicates less stable detection but higher sensitivity. |
kind: DaemonSet apiVersion: apps/v1 metadata: name: cluster-problem-detector namespace: kube-system labels: app: cluster-problem-detector spec: selector: matchLabels: app: cluster-problem-detector template: metadata: labels: app: cluster-problem-detector spec: containers: - image: swr.ap-southeast-3.myhuaweicloud.com/hwofficial/cluster-problem-detector:<federation-version> name: cluster-problem-detector command: - /bin/sh - '-c' - /var/paas/cluster-problem-detector/cluster-problem-detector --karmada-kubeconfig=/tmp/config --karmada-context=federation --cluster-name=<your-cluster-name> --host-name=${HOST_NAME} --bind-address=${POD_ADDRESS} --healthz-port=8081 --detectors=* --coredns-detect-period=5s --coredns-success-threshold=30s --coredns-failure-threshold=30s --coredns-stale-threshold=60s env: - name: POD_ADDRESS valueFrom: fieldRef: apiVersion: v1 fieldPath: status.podIP - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: HOST_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName livenessProbe: httpGet: path: /healthz port: 8081 scheme: HTTP initialDelaySeconds: 3 timeoutSeconds: 3 periodSeconds: 5 successThreshold: 1 failureThreshold: 3 readinessProbe: httpGet: path: /healthz port: 8081 scheme: HTTP initialDelaySeconds: 3 timeoutSeconds: 3 periodSeconds: 5 successThreshold: 1 failureThreshold: 3 volumeMounts: - mountPath: /tmp name: karmada-config serviceAccountName: cluster-problem-detector volumes: - configMap: name: karmada-kubeconfig items: - key: kubeconfig path: config name: karmada-config securityContext: fsGroup: 10000 runAsUser: 10000 seccompProfile: type: RuntimeDefault hostAliases: - hostnames: - <host name of karmada server> ip: <ip of host name of karmada server> --- apiVersion: v1 kind: ServiceAccount metadata: name: cluster-problem-detector namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: cpd-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:cluster-problem-detector subjects: - kind: ServiceAccount name: cluster-problem-detector namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: system:cluster-problem-detector rules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch - apiGroups: - "" resources: - nodes/status verbs: - patch - update - apiGroups: - "" - events.k8s.io resources: - events verbs: - create - patch - update - apiGroups: - coordination.k8s.io resources: - leases verbs: - get - list - watch - create - update - patch - delete --- apiVersion: v1 kind: ConfigMap metadata: name: karmada-kubeconfig namespace: kube-system data: kubeconfig: |+ <kubeconfig-of-karmada>
Checking Whether CPD Runs Normally
After deploying CPD, check whether CPD runs normally.
- Run the following command to check whether the ServiceDomainNameResolutionReady condition exists in conditions of the node and whether lastHeartBeatTime of this condition is updated in a timely manner:
kubectl get node <node-name> -oyaml | grep -B4 ServiceDomainNameResolutionReady
If the condition does not exist or lastHeartBeatTime of the condition is not updated for a long time:
- Check whether the CPD pod is in the Ready state.
- Check whether there is a LoadCorednsConditionFailed or StoreCorednsConditionFailed event in the member cluster. If the event exists, rectify the fault based on the error message in the event.
- Run the following command to check whether the ServiceDomainNameResolutionReady condition exists in the federation cluster object:
kubectl --kubeconfig <kubeconfig-of-federation> get cluster <cluster-name> -oyaml | grep ServiceDomainNameResolutionReady
If the cluster object does not contain the preceding condition:
- Check "failed to sync corendns condition to control plane, requeuing" in the CPD log.
- Check the kubeconfig file configuration. If the kubeconfig file configuration is updated, deploy CPD again.
- Check the network connectivity between the node where CPD resides and the VPC of the cluster you selected when the kubeconfig file is downloaded.
Configuring a Policy for Conditional Automatic Traffic Switchover
Once CPD is deployed and runs normally, you need to create a Remedy object to perform specific actions when certain conditions are met. For example, if CoreDNS in a cluster is faulty, the cluster traffic will be redirected to an available cluster.
The following is an example configuration file of the Remedy object. The Remedy object is defined to report exceptions of CoreDNS using CPD in the cluster member1 or member2. If CoreDNS is faulty, the cluster traffic will be redirected to an available cluster automatically. For details about the parameters of the Remedy object, see Table 2.
apiVersion: remedy.karmada.io/v1alpha1 kind: Remedy metadata: name: foo spec: clusterAffinity: clusterNames: - member1 - member2 decisionMatches: - clusterConditionMatch: conditionType: ServiceDomainNameResolutionReady operator: Equal conditionStatus: "False" actions: - TrafficControl
Parameter |
Description |
---|---|
spec.clusterAffinity.clusterNames |
List of clusters controlled by the policy. The specified action is performed only for clusters in the list. If this parameter is left blank, no action is performed. |
spec.decisionMatches |
Trigger condition list. When a cluster in the cluster list meets any trigger condition, the specified action is performed. If this parameter is left blank, the specified action is triggered unconditionally. |
conditionType |
Type of a trigger condition. Only ServiceDomainNameResolutionReady (domain name resolution of CoreDNS reported by CPD) is supported. |
operator |
Judgment logic. Only Equal (equal to) and NotEqual (not equal to) are supported. |
conditionStatus |
Status of a trigger condition. |
actions |
Action to be performed by the policy. Currently, only TrafficControl (traffic control) is supported. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot