CCE Network Metrics Exporter
Introduction
CCE Network Metrics Exporter enhances the observability of container networks by providing monitoring data related to container networks in CCE Turbo clusters, so you can observe network traffic and detect and locate network problems faster. The add-on pods can only be deployed on HCE 2.0 nodes running x86/Arm, or EulerOS nodes running x86.
The IP and TCP traffic can be monitored by pod or flow. You can use podSelector to select the monitoring backend. Multiple monitoring tasks can be created and monitoring metrics can be selected as required. The label information of pods can also be obtained. The monitoring information has been adapted to the Prometheus format. You can call the Prometheus API to view monitoring data.
Notes and Constraints
- This add-on can be installed only in CCE Turbo clusters 1.19 or later. The add-on pods can run only on nodes running EulerOS on x86.
- This add-on can be installed on nodes that use the containerd or Docker container engine. In containerd nodes, it can trace pod updates in real time. In Docker nodes, it can query pod updates in polling mode.
- Only traffic statistics of secure containers using the Kata container runtime and common containers (using the runC container runtime) in a CCE Turbo cluster can be collected.
- By default, traffic monitoring is not carried out after the installation of the add-on. To set up a traffic monitoring task, you must create a MonitorPolicy in the kube-system namespace.
- Pods using the host network mode cannot be monitored.
- Ensure that there are enough resources on a node for installing the add-on.
- The source of monitoring labels and user labels must be already available before a pod is created.
- You can specify a maximum of five labels (a maximum of 10 labels in versions later than 1.3.4). You cannot specify the labels used by the system. Labels used by the system include pod, task, ipfamily, srcip, dstip, srcport, dstport, and protocol.
Installing the Add-on
- Log in to the CCE console and click the CCE Turbo cluster name to access the cluster. In the navigation pane, choose Add-ons, locate CCE Network Metrics Exporter on the right, and click Install.
- On the Install Add-on page, check the add-on configuration.
No parameter can be configured for the current add-on.
- Click Install.
After the add-on is installed, select the cluster and choose Add-ons in the navigation pane. On the displayed page, view the add-on in the Add-ons Installed area.
Components
Component |
Description |
Resource Type |
---|---|---|
dolphin |
Used to monitor the container network traffic of CCE Turbo clusters |
DaemonSet |
Monitoring Metrics of dolphin
You can deliver a monitoring task by creating a MonitorPolicy. A MonitorPolicy can be created by calling an API or using the kubectl apply command after logging in to a worker node. A MonitorPolicy represents a monitoring task and provides optional parameters such as selector and podLabel. The following table describes the supported monitoring metrics.
Monitoring Metric |
Monitoring Item |
Granularity |
Supported Runtime |
Supported Cluster Version |
Supported Add-on Version |
Supported OS |
---|---|---|---|---|---|---|
Number of IPv4 packets sent to the Internet |
dolphin_ip4_send_pkt_internet |
Pod |
runC/Kata |
v1.19 or later |
1.1.2 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of IPv4 bytes sent to the Internet |
dolphin_ip4_send_byte_internet |
Pod |
runC/Kata |
v1.19 or later |
1.1.2 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of received IPv4 packets |
dolphin_ip4_rcv_pkt |
Pod |
runC/Kata |
v1.19 or later |
1.1.2 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of received IPv4 bytes |
dolphin_ip4_rcv_byte |
Pod |
runC/Kata |
v1.19 or later |
1.1.2 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of sent IPv4 packets |
dolphin_ip4_send_pkt |
Pod |
runc/kata |
v1.19 or later |
1.1.2 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of sent IPv4 bytes |
dolphin_ip4_send_byte |
Pod |
runC/Kata |
v1.19 or later |
1.1.2 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Health status of the latest health check |
dolphin_health_check_status |
Pod |
runc/kata |
v1.19 or later |
1.2.2 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Total number of successful health checks |
dolphin_health_check_successful_counter |
Pod |
runC/Kata |
v1.19 or later |
1.2.2 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Total number of failed health checks |
dolphin_health_check_failed_counter |
Pod |
runC/Kata |
v1.19 or later |
1.2.2 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of received IP packets |
dolphin_ip_receive_pkt |
Pod |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of received IP bytes |
dolphin_ip_receive_byte |
Pod |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of sent IP packets |
dolphin_ip_send_pkt |
Pod |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of sent IP bytes |
dolphin_ip_send_byte |
Pod |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of received TCP packets |
dolphin_tcp_receive_pkt |
Pod |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of received TCP bytes |
dolphin_tcp_receive_byte |
Pod |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of sent TCP packets |
dolphin_tcp_send_pkt |
Pod |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of sent TCP bytes |
dolphin_tcp_send_byte |
Pod |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of retransmitted TCP packets |
dolphin_tcp_retrans |
Pod |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of new TCP connections |
dolphin_tcp_connection |
Pod |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of received IP packets |
dolphin_flow_ip_receive_pkt |
Flow |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of received IP bytes |
dolphin_flow_ip_receive_byte |
Flow |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of sent IP packets |
dolphin_flow_ip_send_pkt |
Flow |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of sent IP bytes |
dolphin_flow_ip_send_byte |
Flow |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of received TCP packets |
dolphin_flow_tcp_receive_pkt |
Flow |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of received TCP bytes |
dolphin_flow_tcp_receive_byte |
Flow |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of sent TCP packets |
dolphin_flow_tcp_send_pkt |
Flow |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of sent TCP bytes |
dolphin_flow_tcp_send_byte |
Flow |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Number of retransmitted TCP packets |
dolphin_flow_tcp_retrans |
Flow |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
TCP smoothed round trip |
dolphin_flow_tcp_srtt |
Flow |
runC |
v1.23 or later |
1.3.5 |
EulerOS 2.9 on x86 EulerOS 2.10 on x86 |
Delivering a Monitoring Task
The template for creating a MonitorPolicy is as follows:
apiVersion: crd.dolphin.io/v1 kind: MonitorPolicy metadata: name: example-task # Monitoring task name. namespace: kube-system # The value must be kube-system. This field is mandatory. spec: selector: # (Optional) Backend monitored by the dolphin add-on, for example, labelSelector. By default, all containers on the node are monitored. matchLabels: app: nginx matchExpressions: - key: app operator: In values: - nginx podLabel: [app] # (Optional) Pod label. ip4Tx: # (Optional) Indicates whether to collect statistics about the number of sent IPv4 packets and the number of sent IPv4 bytes. This function is disabled by default. enable: true ip4Rx: # (Optional) Indicates whether to collect statistics about the number of received IPv4 packets and the number of received IPv4 bytes. This function is disabled by default. enable: true ip4TxInternet: # (Optional) Indicates whether to collect statistics about the number of sent IPv4 packets and the number of sent IPv4 bytes. This function is disabled by default. enable: true healthCheck: # (Optional) Whether to collect statistics about whether the latest health check result is healthy and the total number of healthy times and unhealthy times in the pod health checks of the local node. This function is disabled by default. enable: true # true false failureThreshold: 3 # (Optional) Number of failures that determine the health check is unhealthy. One check failure is considered as unhealthy by default. periodSeconds: 5 # (Optional) Interval between health checks, in seconds. The default value is 60. command: "" # (Optional) Health check command. The value can be ping (default), arping, or curl. ipFamilies: [""] # (Optional) Health check IP address family. The value is IPv4 by default. port: 80 # (Optional) Port number, which is mandatory when curl is used. path: "" # (Optional) HTTP API path, which is mandatory when curl is used. monitor: ip: ipReceive: aggregateType: flow # (Optional). The value can be pod (monitored by pod) or flow (monitored by flow). ipSend: aggregateType: flow # (Optional). The value can be pod (monitored by pod) or flow (monitored by flow). tcp: tcpReceive: aggregateType: flow # (Optional). The value can be pod (monitored by pod) or flow (monitored by flow). tcpSend: aggregateType: flow # (Optional). The value can be pod (monitored by pod) or flow (monitored by flow). tcpRetrans: aggregateType: flow # (Optional). The value can be pod (monitored by pod) or flow (monitored by flow). tcpRtt: aggregateType: flow # (Optional). The value can be flow (monitored by flow). The unit is μs. tcpNewConnection: aggregateType: pod # (Optional). The value can be pod (monitored by pod).
PodLabel: You can enter the labels of multiple pods and separate them with commas (,), for example, [app, version].
Labels must comply with the following rules. The corresponding regular expression is (^[a-zA-Z_]$)|(^([a-zA-Z][a-zA-Z0-9_]|_[a-zA-Z0-9])([a-zA-Z0-9_]){0,254}$).
- A maximum of five labels can be entered (a maximum of 10 labels in versions later than 1.3.4). A label can contain a maximum of 256 characters.
- The value cannot start with a digit or double underscores (_).
- The format of a single label must comply with A-Za-z_0-9.
- If you modify or delete a monitoring task, monitoring data collected by the monitoring task will be lost. Therefore, exercise caution when performing this operation.
- If the add-on is uninstalled, the MonitorPolicy of the monitoring task will be removed together with the add-on.
Example application scenarios:
- The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates the three health check metrics. By default, the ping command is used to detect local pods. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
apiVersion: crd.dolphin.io/v1 kind: MonitorPolicy metadata: name: example-task namespace: kube-system spec: selector: matchLabels: app: nginx podLabel: [test, app] healthCheck: enable: true failureThreshold: 3 periodSeconds: 5
- The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates the three health check metrics. Customized curl command is used, which considers only the network connectivity. That is, no matter what the HTTP code is returned by the program, the pod is considered healthy as long as the network is connected. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
apiVersion: crd.dolphin.io/v1 kind: MonitorPolicy metadata: name: example-task namespace: kube-system spec: selector: matchLabels: app: nginx podLabel: [test, app] healthCheck: enable: true failureThreshold: 3 periodSeconds: 5 command: "curl" port: 80 path: "healthz"
- The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates monitoring data by pod, including the number of sent IP packets, received IP packets, sent IP bytes, received IP bytes, sent TCP packets, received TCP packets, sent TCP bytes, received TCP bytes, retransmitted TCP packets, and new TCP connections. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
apiVersion: crd.dolphin.io/v1 kind: MonitorPolicy metadata: name: example-task namespace: kube-system spec: selector: matchLabels: app: nginx podLabel: [test, app] monitor: ip: ipReceive: aggregateType: pod ipSend: aggregateType: pod tcp: tcpReceive: aggregateType: pod tcpSend: aggregateType: pod tcpRetrans: aggregateType: pod tcpNewConnection: aggregateType: pod
- The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates monitoring data by flow, including the number of sent IP packets, received IP packets, sent IP bytes, received IP bytes, sent TCP packets, received TCP packets, sent TCP bytes, received TCP bytes, retransmitted TCP packets, and TCP round-trip time (µs). If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found. Flow-based monitoring helps you learn about detailed container traffic information. It generates a large amount of data that occupies more CPU and memory resources. Use flow-based monitoring based on your needs.
A flow-based IP monitoring task (one or more IP monitoring items enabled in a MonitorPolicy) occupies 2.6 MiB of kernel memory. A flow-based TCP monitoring task (one or more TCP monitoring items enabled in a MonitorPolicy) occupies 14 MiB of kernel memory.
apiVersion: crd.dolphin.io/v1 kind: MonitorPolicy metadata: name: example-task namespace: kube-system spec: selector: matchLabels: app: nginx podLabel: [test, app] monitor: ip: ipReceive: aggregateType: flow ipSend: aggregateType: flow tcp: tcpReceive: aggregateType: flow tcpSend: aggregateType: flow tcpRetrans: aggregateType: flow tcpRtt: aggregateType: flow
If the data generated by flow-based monitoring exceeds a certain limit, excess flow statistics will be lost. The restrictions are as follows:
- A maximum of 50,000 TCP flows (per monitoring task) can be collected in kernel mode within 10 seconds.
- A maximum of 10,000 IP flows (per monitoring task) can be collected in kernel mode within 10 seconds.
- A maximum of 60,000 flow statistical records (all monitoring tasks) can be cached at the interval between two CloudScope data fetches.
- If CloudScope does not obtain monitoring data for a long time, only the monitoring data generated within the latest hour will be cached.
- The example below monitors all pods on a node and generates the number of sent IPv4 packets and the number of sent IPv4 bytes. If the monitored container contains the app label, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
apiVersion: crd.dolphin.io/v1 kind: MonitorPolicy metadata: name: example-task namespace: kube-system spec: podLabel: [app] ip4Tx: enable: true
- The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates the number of sent IPv4 packets, received IPv4 packets, sent IPv4 bytes, received IPv4 bytes, IPv4 packets sent to the public network, and IPv4 bytes sent to the public network. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
apiVersion: crd.dolphin.io/v1 kind: MonitorPolicy metadata: name: example-task namespace: kube-system spec: selector: matchLabels: app: nginx podLabel: [test, app] ip4Tx: enable: true ip4Rx: enable: true ip4TxInternet: enable: true
Checking Traffic Statistics
The monitoring data collected by this add-on is exported in Prometheus exporter format, which can be obtained in the following ways:
- Directly access service port 10001 provided by the dolphin add-on, for example, http://{POD_IP}:10001/metrics.
Note that if you access the dolphin service port on a node, allow access from the security group of the node and pod.
Examples of the monitored information:
- Example 1 (number of IPv4 packets sent to the Internet):
dolphin_ip4_send_pkt_internet{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 241
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 packets sent by the pod to the public network is 241.
- Example 2 (number of IPv4 bytes sent to the Internet):
dolphin_ip4_send_byte_internet{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task" } 23618
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 bytes sent by the pod to the public network is 23618.
- Example 3 (number of sent IPv4 packets):
dolphin_ip4_send_pkt{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 379
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 packets sent by the pod is 379.
- Example 4 (number of sent IPv4 bytes):
dolphin_ip4_send_byte{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 33129
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 bytes sent by the pod is 33129.
- Example 5 (number of received IPv4 packets):
dolphin_ip4_rcv_pkt{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 464
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 packets received by the pod is 464.
- Example 6 (number of received IPv4 bytes):
dolphin_ip4_rcv_byte{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 34654
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 bytes received by the pod is 34654.
- Example 7 (health check status)
dolphin_health_check_status{app="nginx",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 0
In the preceding example, the namespace of the pod is kube-system, the pod name is default/nginx-deployment-b74766f5f-7582p, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the network health status of the pod is 0 (healthy). If the network status is unhealthy, the value will be 1.
- Example 8 (number of successful health checks)
dolphin_health_check_successful_counter{app="nginx",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 5
In the preceding example, the namespace of the pod is kube-system, the pod name is default/nginx-deployment-b74766f5f-7582p, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of successful network health checks for the pod is 5.
- Example 9 (number of failed health check failures)
dolphin_health_check_failed_counter{app="nginx",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 0
In the preceding example, the namespace of the pod is kube-system, the pod name is default/nginx-deployment-b74766f5f-7582p, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of failed network health checks for the pod is 0.
- Example 10 (flow-based monitoring result):
dolphin_flow_tcp_send_byte{app="nginx",dstip="192.168.0.89",dstport="80",ipfamily="ipv4",pod="kube-system/nginx-b74766f5f-7582p",srcip="192.168.1.67",srcport="12973",task="kube-system/example-task"} 1725 1700538280914
In the preceding example, the namespace of the pod is kube-system, the pod name is nginx-b74766f5f-7582p, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 TCP bytes sent from 192.168.1.67:12973 to 192.168.0.89:80 is 1725. The timestamp is 1700538280914.
- Example 11 (pod-based monitoring result):
dolphin_tcp_send_pkt{app="nginx",ipfamily="ipv4",pod="kube-system/nginx-b74766f5f-7582p",task="kube-system/example-task"} 14 dolphin_tcp_send_pkt{app="nginx",ipfamily="ipv6",pod="kube-system/nginx-b74766f5f-7582p",task="kube-system/example-task"} 0
In the preceding example, the namespace of the pod is kube-system, the pod name is nginx-b74766f5f-7582p, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 packets sent by the pod is 14. 0 IPv6 packets were sent by this pod.
If the container does not contain the specified label, the label value in the response body is not found. The format is as follows:
dolphin_ip4_send_byte_internet{test="not found", pod="default/nginx-66c9c65dbf-zjg24",task="default" } 23618
Change History
Add-on Version |
Supported Cluster Version |
New Feature |
---|---|---|
1.4.7 |
v1.23 v1.25 v1.27 v1.28 v1.29 |
Fixed some issues. |
1.4.5 |
v1.23 v1.25 v1.27 v1.28 v1.29 |
|
1.3.8 |
v1.23 v1.25 v1.27 v1.28 |
|
1.2.27 |
v1.19 v1.21 v1.23 v1.25 |
None |
1.2.4 |
v1.19 v1.21 v1.23 v1.25 |
|
1.2.2 |
v1.19 v1.21 v1.23 v1.25 |
|
1.1.8 |
v1.19 v1.21 v1.23 v1.25 |
|
1.1.6 |
v1.19 v1.21 v1.23 |
None |
1.1.5 |
v1.19 v1.21 v1.23 |
|
1.1.2 |
v1.19 v1.21 v1.23 |
|
1.0.1 |
v1.19 v1.21 |
|
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.