Configuring Conditional Automatic Traffic Switchover
This section describes how to configure conditional automatic traffic switchover to identify CoreDNS faults in a cluster and automatically redirect traffic.
Installing CPD for a Cluster to Identify Faults
Before configuring automatic traffic switchover, you need to install cluster-problem-detector (CPD) in a cluster to automatically detect whether CoreDNS runs normally and report the results.
CPD periodically checks whether CoreDNS can resolve kubernetes.default and updates the result to conditions of the node object. The active CPD pod collects conditions on each node, determines whether cluster domain name resolution is normal, and reports the result to the federation control plane of the cluster.
CPD needs to be independently deployed as a DaemonSet on all nodes in each cluster. The following is an example CPD configuration file. You can modify the parameters by referring to Table 1.
|
Parameter |
Description |
|---|---|
|
<federation-version> |
Version of the federation that the cluster belongs to. On the Fleets tab, click the fleet name to obtain the version. |
|
<your-cluster-name> |
Name of the cluster where CPD is to be installed. |
|
<kubeconfig-of-karmada> |
The kubeconfig file of the federation control plane. For details about how to download the kubeconfig file that meets the requirements, see kubeconfig.
CAUTION:
|
|
hostAliases |
If the IP address of the federation control plane in the kubeconfig file is set to a domain name, you need to configure hostAliases in the YAML file. If the IP address is not a domain name, delete hostAliases from the YAML file.
|
|
coredns-detect-period |
Interval for CoreDNS to detect and report data, which defaults to 5s (recommended value). A smaller value indicates more frequent data detection and reporting. |
|
coredns-success-threshold |
Threshold of the duration in which CoreDNS successfully resolves a domain name, which defaults to 30s (recommended value). If the duration exceeds this threshold, CoreDNS is normal. A higher value indicates more stable detection but lower sensitivity, while a lower value indicates less stable detection but higher sensitivity. |
|
coredns-failure-threshold |
Threshold of the duration in which CoreDNS fails to resolve a domain name, which defaults to 30s (recommended value). If the duration exceeds this threshold, CoreDNS is faulty. A higher value indicates more stable detection but lower sensitivity, while a lower value indicates less stable detection but higher sensitivity. |
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: cluster-problem-detector
namespace: kube-system
labels:
app: cluster-problem-detector
spec:
selector:
matchLabels:
app: cluster-problem-detector
template:
metadata:
labels:
app: cluster-problem-detector
spec:
containers:
- image: swr.ap-southeast-3.myhuaweicloud.com/hwofficial/cluster-problem-detector:<federation-version>
name: cluster-problem-detector
command:
- /bin/sh
- '-c'
- /var/paas/cluster-problem-detector/cluster-problem-detector
--karmada-kubeconfig=/tmp/config
--karmada-context=federation
--cluster-name=<your-cluster-name>
--host-name=${HOST_NAME}
--bind-address=${POD_ADDRESS}
--healthz-port=8081
--detectors=*
--coredns-detect-period=5s
--coredns-success-threshold=30s
--coredns-failure-threshold=30s
--coredns-stale-threshold=60s
env:
- name: POD_ADDRESS
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: HOST_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
livenessProbe:
httpGet:
path: /healthz
port: 8081
scheme: HTTP
initialDelaySeconds: 3
timeoutSeconds: 3
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz
port: 8081
scheme: HTTP
initialDelaySeconds: 3
timeoutSeconds: 3
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
volumeMounts:
- mountPath: /tmp
name: karmada-config
serviceAccountName: cluster-problem-detector
volumes:
- configMap:
name: karmada-kubeconfig
items:
- key: kubeconfig
path: config
name: karmada-config
securityContext:
fsGroup: 10000
runAsUser: 10000
seccompProfile:
type: RuntimeDefault
hostAliases:
- hostnames:
- <host name of karmada server>
ip: <ip of host name of karmada server>
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cluster-problem-detector
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cpd-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:cluster-problem-detector
subjects:
- kind: ServiceAccount
name: cluster-problem-detector
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: system:cluster-problem-detector
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
- update
- apiGroups:
- ""
- events.k8s.io
resources:
- events
verbs:
- create
- patch
- update
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
---
apiVersion: v1
kind: ConfigMap
metadata:
name: karmada-kubeconfig
namespace: kube-system
data:
kubeconfig: |+
<kubeconfig-of-karmada>
Checking Whether CPD Runs Normally
After deploying CPD, check whether CPD runs normally.
- Run the following command to check whether the ServiceDomainNameResolutionReady condition exists in conditions of the node and whether lastHeartBeatTime of this condition is updated in a timely manner:
kubectl get node <node-name> -oyaml | grep -B4 ServiceDomainNameResolutionReady
If the condition does not exist or lastHeartBeatTime of the condition is not updated for a long time:
- Check whether the CPD pod is in the Ready state.
- Check whether there is a LoadCorednsConditionFailed or StoreCorednsConditionFailed event in the member cluster. If the event exists, rectify the fault based on the error message in the event.
- Run the following command to check whether the ServiceDomainNameResolutionReady condition exists in the federation cluster object:
kubectl --kubeconfig <kubeconfig-of-federation> get cluster <cluster-name> -oyaml | grep ServiceDomainNameResolutionReady
If the cluster object does not contain the preceding condition:
- Check "failed to sync corendns condition to control plane, requeuing" in the CPD log.
- Check the kubeconfig file configuration. If the kubeconfig file configuration is updated, deploy CPD again.
- Check the network connectivity between the node where CPD resides and the VPC of the cluster you selected when the kubeconfig file is downloaded.
Configuring a Policy for Conditional Automatic Traffic Switchover
Once CPD is deployed and runs normally, you need to create a Remedy object to perform specific actions when certain conditions are met. For example, if CoreDNS in a cluster is faulty, the cluster traffic will be redirected to an available cluster.
The following is an example configuration file of the Remedy object. The Remedy object is defined to report exceptions of CoreDNS using CPD in the cluster member1 or member2. If CoreDNS is faulty, the cluster traffic will be redirected to an available cluster automatically. For details about the parameters of the Remedy object, see Table 2.
apiVersion: remedy.karmada.io/v1alpha1
kind: Remedy
metadata:
name: foo
spec:
clusterAffinity:
clusterNames:
- member1
- member2
decisionMatches:
- clusterConditionMatch:
conditionType: ServiceDomainNameResolutionReady
operator: Equal
conditionStatus: "False"
actions:
- TrafficControl
|
Parameter |
Description |
|---|---|
|
spec.clusterAffinity.clusterNames |
List of clusters controlled by the policy. The specified action is performed only for clusters in the list. If this parameter is left blank, no action is performed. |
|
spec.decisionMatches |
Trigger condition list. When a cluster in the cluster list meets any trigger condition, the specified action is performed. If this parameter is left blank, the specified action is triggered unconditionally. |
|
conditionType |
Type of a trigger condition. Only ServiceDomainNameResolutionReady (domain name resolution of CoreDNS reported by CPD) is supported. |
|
operator |
Judgment logic. Only Equal (equal to) and NotEqual (not equal to) are supported. |
|
conditionStatus |
Status of a trigger condition. |
|
actions |
Action to be performed by the policy. Currently, only TrafficControl (traffic control) is supported. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot

