CCE Network Metrics Exporter_Cloud Native Observability Add-ons_Add-ons_User Guide (Kuala Lumpur Region)

Introduction

CCE Network Metrics Exporter enhances the observability of container networks by providing monitoring data related to container networks in CCE Turbo clusters, so you can observe network traffic and detect and locate network problems faster. Add-on pods can only be deployed on Huawei Cloud EulerOS 2.0 nodes on x86 or Arm or EulerOS nodes on x86.

The IP and TCP traffic can be monitored by pod or flow. You can use podSelector to select the monitoring backend. Multiple monitoring tasks can be created and monitoring metrics can be selected as required. The label information of pods can also be obtained. The monitoring data has been adapted to Prometheus. You can call the Prometheus API to view monitoring data.

Notes and Constraints

This add-on can only be installed in CCE Turbo clusters v1.19 or later. The add-on pods can only run on Huawei Cloud EulerOS 2.0 nodes on x86 or Arm (supported by CCE Network Exporter 1.4.5 or later) or EulerOS nodes on x86.
This add-on can be installed on nodes that use the containerd or Docker container engine. In containerd nodes, it can trace pod updates in real time. In Docker nodes, it can query pod updates in polling mode.
Only traffic statistics of secure containers using the Kata container runtimes and common containers (using the runC container runtimes) in a CCE Turbo cluster can be collected.
By default, traffic monitoring is not carried out after the installation of the add-on. To set up a traffic monitoring task, you must create a MonitorPolicy in the kube-system namespace.
Pods using the host network mode cannot be monitored.
Ensure that there are enough resources on a node for installing the add-on.
The source of monitoring labels and user labels must be already available before a pod is created.
You can specify a maximum of five labels (a maximum of 10 labels in versions later than 1.3.4). You cannot specify the labels used by the system. Labels used by the system include pod, task, ipfamily, srcip, dstip, srcport, dstport, and protocol.

Installing the Add-on

Log in to the CCE console and click the cluster name to access the cluster console.
In the navigation pane, choose Add-ons. Locate CCE Network Metrics Exporter on the right and click Install.
On the Install Add-on page, check the add-on configuration.

No parameter can be configured for the current add-on.
Click Install.

After the add-on is installed, select the cluster and choose Add-ons in the navigation pane. On the displayed page, view the add-on in the Add-ons Installed area.

Components

**Table 1** Add-on components
Component	Description	Resource Type
dolphin	Monitor the container network traffic in CCE Turbo clusters.	DaemonSet

Monitoring Metrics of dolphin

You can deliver a monitoring task by creating a MonitorPolicy. A MonitorPolicy can be created by calling an API or using the kubectl apply command after logging in to a worker node. A MonitorPolicy represents a monitoring task and provides optional parameters such as selector and podLabel. The following table describes the supported monitoring metrics.

**Table 2** Supported monitoring metrics
Monitoring Metric	Monitoring Item	Granularity	Supported Runtime	Supported Cluster Version	Supported Add-on Version	Supported OS
Number of IPv4 packets sent to the Internet	dolphin_ip4_send_pkt_internet	Pod	runC/Kata	v1.19 or later	1.1.2	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of IPv4 bytes sent to the Internet	dolphin_ip4_send_byte_internet	Pod	runC/Kata	v1.19 or later	1.1.2	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of received IPv4 packets	dolphin_ip4_rcv_pkt	Pod	runC/Kata	v1.19 or later	1.1.2	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of received IPv4 bytes	dolphin_ip4_rcv_byte	Pod	runC/Kata	v1.19 or later	1.1.2	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of sent IPv4 packets	dolphin_ip4_send_pkt	Pod	runC/Kata	v1.19 or later	1.1.2	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of sent IPv4 bytes	dolphin_ip4_send_byte	Pod	runC/Kata	v1.19 or later	1.1.2	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Health status of the latest health check	dolphin_health_check_status	Pod	runC/Kata	v1.19 or later	1.2.2	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Total number of successful health checks	dolphin_health_check_successful_counter	Pod	runC/Kata	v1.19 or later	1.2.2	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Total number of failed health checks	dolphin_health_check_failed_counter	Pod	runC/Kata	v1.19 or later	1.2.2	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of received IP packets	dolphin_ip_receive_pkt	Pod	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of received IP bytes	dolphin_ip_receive_byte	Pod	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of sent IP packets	dolphin_ip_send_pkt	Pod	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of sent IP bytes	dolphin_ip_send_byte	Pod	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of received TCP packets	dolphin_tcp_receive_pkt	Pod	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of received TCP bytes	dolphin_tcp_receive_byte	Pod	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of sent TCP packets	dolphin_tcp_send_pkt	Pod	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of sent TCP bytes	dolphin_tcp_send_byte	Pod	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of retransmitted TCP packets	dolphin_tcp_retrans	Pod	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of new TCP connections	dolphin_tcp_connection	Pod	runC	v1.23 or later	1.3.5 (replaced by dolphin_tcp_clientconnection and dolphin_tcp_serverconnection in versions later than 1.4.5, and not recommended)	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of received IP packets	dolphin_flow_ip_receive_pkt	Flow	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of received IP bytes	dolphin_flow_ip_receive_byte	Flow	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of sent IP packets	dolphin_flow_ip_send_pkt	Flow	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of sent IP bytes	dolphin_flow_ip_send_byte	Flow	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of received TCP packets	dolphin_flow_tcp_receive_pkt	Flow	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of received TCP bytes	dolphin_flow_tcp_receive_byte	Flow	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of sent TCP packets	dolphin_flow_tcp_send_pkt	Flow	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of sent TCP bytes	dolphin_flow_tcp_send_byte	Flow	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
Number of retransmitted TCP packets	dolphin_flow_tcp_retrans	Flow	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86
TCP smoothed round trip	dolphin_flow_tcp_srtt	Flow	runC	v1.23 or later	1.3.5	EulerOS 2.9 on x86 EulerOS 2.10 on x86

Delivering a Monitoring Task

The template for creating a MonitorPolicy is as follows:

apiVersion: crd.dolphin.io/v1
kind: MonitorPolicy
metadata:
    name: example-task            # Monitoring task name.
    namespace: kube-system        # The value must be kube-system. This field is mandatory.
spec:
    selector:                     # (Optional) Backend monitored by the add-on. The value has the same format as the labelSelector. By default, all pods on the node are monitored.
    matchLabels:
      app: nginx
    matchExpressions:
      - key: app
        operator: In
        values:
          - nginx
  podLabel: [app]               # (Optional) Pod label.
  ip4Tx:                        # (Optional) Indicates whether to collect statistics about the number of sent IPv4 packets and the number of sent IPv4 bytes. This function is disabled by default.
    enable: true
    ip4Rx:                        # (Optional) Indicates whether to collect statistics about the number of received IPv4 packets and the number of received IPv4 bytes. This function is disabled by default.
    enable: true
    ip4TxInternet:                # (Optional) Indicates whether to collect statistics about the number of sent IPv4 packets and the number of sent IPv4  bytes. This function is disabled by default.
    enable: true
  healthCheck:                  # (Optional) Whether to collect statistics about whether the latest health check result is healthy and the total number of healthy times and unhealthy times in the pod health checks of the local node. This function is disabled by default.
    enable: true                # true false
    failureThreshold: 3         # (Optional) Number of failures that determine the health check is unhealthy. One check failure is considered as unhealthy by default.
    periodSeconds: 5            # (Optional) Interval between health checks, in seconds. The default value is 60.
    command: ""                 # (Optional) Health check command. The value can be ping (default), arping, or curl.
    ipFamilies: [""]            # (Optional) Health check IP address family. The value is IPv4 by default.
    port: 80                    # (Optional) Port number, which is mandatory when curl is used.
    path: ""                    # (Optional) HTTP API path, which is mandatory when curl is used.
  monitor:
    ip:
      ipReceive:
        aggregateType: flow       # (Optional). The value can be pod (monitored by pod) or flow (monitored by flow).
      ipSend:
        aggregateType: flow       # (Optional). The value can be pod (monitored by pod) or flow (monitored by flow).
    tcp:
      tcpReceive:
        aggregateType: flow       # (Optional). The value can be pod (monitored by pod) or flow (monitored by flow).
      tcpSend:
        aggregateType: flow       # (Optional). The value can be pod (monitored by pod) or flow (monitored by flow).
      tcpRetrans:
        aggregateType: flow       # (Optional). The value can be pod (monitored by pod) or flow (monitored by flow).
      tcpRtt:
        aggregateType: flow       # (Optional). The value can be flow (monitored by flow). The unit is μs.
      tcpNewConnection:
        aggregateType: pod        # (Optional). The value can be pod (monitored by pod).

PodLabel: You can enter the labels of multiple pods and separate them with commas (,), for example, [app, version].

The labels must comply with the following rules, and the corresponding regular expression is (^[a-zA-Z0-9_][a-zA-Z0-9\-\._/]{0,254}$).

A maximum of five labels can be entered (a maximum of 10 labels in versions later than 1.3.4). A label can contain a maximum of 255 characters.
All characters except letters and digits are replaced with underscores (_).

You can create, modify, and delete monitoring tasks in the preceding format. A maximum of 10 monitoring tasks can be created. When multiple monitoring tasks match the same monitoring backend, each monitoring backend generates the monitoring metrics specific to the number of monitoring tasks.

If you modify or delete a monitoring task, monitoring data collected by the monitoring task will be lost. Therefore, exercise caution when performing this operation.
If the add-on is uninstalled, the MonitorPolicy of the monitoring task will be removed together with the add-on.

Example application scenarios:

The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates the three health check metrics. By default, the ping command is used to detect local pods. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
```
apiVersion: crd.dolphin.io/v1
kind: MonitorPolicy
metadata:
  name: example-task  
  namespace: kube-system        
spec:
  selector:
    matchLabels:
      app: nginx
  podLabel: [test, app] 
  healthCheck: 
    enable: true
    failureThreshold: 3
    periodSeconds: 5
```
The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates the three health check metrics. Customized curl command is used, which considers only the network connectivity. That is, no matter what the HTTP code is returned by the program, the pod is considered healthy as long as the network is connected. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
```
apiVersion: crd.dolphin.io/v1
kind: MonitorPolicy
metadata:
  name: example-task
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: nginx
  podLabel: [test, app]
  healthCheck: 
    enable: true
    failureThreshold: 3
    periodSeconds: 5 
    command: "curl"
    port: 80
    path: "healthz"
```
The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates monitoring data by pod, including the number of sent IP packets, received IP packets, sent IP bytes, received IP bytes, sent TCP packets, received TCP packets, sent TCP bytes, received TCP bytes, retransmitted TCP packets, and new TCP connections. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
```
apiVersion: crd.dolphin.io/v1
kind: MonitorPolicy
metadata:
  name: example-task
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: nginx
  podLabel: [test, app]
  monitor:
    ip:
      ipReceive:
        aggregateType: pod
      ipSend:
        aggregateType: pod
    tcp:
      tcpReceive:
        aggregateType: pod
      tcpSend:
        aggregateType: pod
      tcpRetrans:
        aggregateType: pod
      tcpNewConnection:
        aggregateType: pod
```
The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates monitoring data by flow, including the number of sent IP packets, received IP packets, sent IP bytes, received IP bytes, sent TCP packets, received TCP packets, sent TCP bytes, received TCP bytes, retransmitted TCP packets, and TCP round-trip time (µs). If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found. Flow-based monitoring helps you learn about detailed container traffic information. It generates a large amount of data that occupies more CPU and memory resources. Use flow-based monitoring based on your needs.
A flow-based IP monitoring task (one or more IP monitoring items enabled in a MonitorPolicy) occupies 2.6 MiB of kernel memory. A flow-based TCP monitoring task (one or more TCP monitoring items enabled in a MonitorPolicy) occupies 14 MiB of kernel memory.
```
apiVersion: crd.dolphin.io/v1
kind: MonitorPolicy
metadata:
  name: example-task
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: nginx
  podLabel: [test, app]
  monitor:
    ip:
      ipReceive:
        aggregateType: flow
      ipSend:
        aggregateType: flow
    tcp:
      tcpReceive:
        aggregateType: flow
      tcpSend:
        aggregateType: flow
      tcpRetrans:
        aggregateType: flow
      tcpRtt:
        aggregateType: flow
```
If the data generated by flow-based monitoring exceeds a certain limit, excess flow statistics will be lost. The restrictions are as follows:
- A maximum of 50,000 TCP flows (per monitoring task) can be collected in kernel mode within 10 seconds.
- A maximum of 10,000 IP flows (per monitoring task) can be collected in kernel mode within 10 seconds.
- A maximum of 60,000 flow statistical records (all monitoring tasks) can be cached at the interval between two Prometheus data fetches.
- If Prometheus does not obtain monitoring data for a long time, only the monitoring data generated within the latest hour will be cached.
The example below monitors all pods on a node and generates the number of sent IPv4 packets and the number of sent IPv4 bytes. If the monitored container contains the app label, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
```
apiVersion: crd.dolphin.io/v1
kind: MonitorPolicy
metadata:
  name: example-task  
  namespace: kube-system        
spec:
  podLabel: [app]
  ip4Tx:
    enable: true
```
The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates the number of sent IPv4 packets, received IPv4 packets, sent IPv4 bytes, received IPv4 bytes, IPv4 packets sent to the public network, and IPv4 bytes sent to the public network. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
```
apiVersion: crd.dolphin.io/v1
kind: MonitorPolicy
metadata:
  name: example-task
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: nginx
  podLabel: [test, app]
  ip4Tx:
    enable: true
  ip4Rx:
    enable: true
  ip4TxInternet:      
    enable: true
```

Checking Traffic Statistics

The CCE Network Metrics Exporter add-on outputs monitoring information in the Prometheus exporter format. You can obtain this information in either of the following ways:

Directly access the service port 10001 provided by CCE Network Metrics Exporter, for example, http://{POD_IP}:10001/metrics.
Note that if you access the add-on service port on a node, allow access from the security group of the node and pod.

Examples of the monitored information:

Example 1 (number of IPv4 packets sent to the Internet):
```
dolphin_ip4_send_pkt_internet{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 241
```
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 packets sent by the pod to the public network is 241.
Example 2 (number of IPv4 bytes sent to the Internet):
```
dolphin_ip4_send_byte_internet{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task" } 23618
```
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 bytes sent by the pod to the public network is 23618.
Example 3 (number of sent IPv4 packets):
```
dolphin_ip4_send_pkt{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 379
```
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 packets sent by the pod is 379.
Example 4 (number of sent IPv4 bytes):
```
dolphin_ip4_send_byte{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 33129
```
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 bytes sent by the pod is 33129.

Example 5 (number of received IPv4 packets):
```
dolphin_ip4_rcv_pkt{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 464
```
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 packets received by the pod is 464.
Example 6 (number of received IPv4 bytes):
```
dolphin_ip4_rcv_byte{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 34654
```
In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 bytes received by the pod is 34654.
Example 7 (health check status)
```
dolphin_health_check_status{app="nginx",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 0
```
In the preceding example, the pod named nginx-deployment-b74766f5f-7582p is in the default namespace with the label key app and value nginx. This metric is generated by the monitoring task named example-task, and the pod's network health status is currently 0 (healthy). If the network status becomes unhealthy, the value will change to 1.
Example 8 (number of successful health checks)
```
dolphin_health_check_successful_counter{app="nginx",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 5
```
In the preceding example, the pod named nginx-deployment-b74766f5f-7582p is in the default namespace with the label key app and value nginx. This metric is generated by the monitoring task named example-task, and the pod had five successful network health checks.
Example 9 (number of failed health check failures)
```
dolphin_health_check_failed_counter{app="nginx",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 0
```
In the preceding example, the pod named nginx-deployment-b74766f5f-7582p is in the default namespace with the label key app and value nginx. This metric is generated by the monitoring task named example-task, and the pod had zero failed network health checks.
Example 10 (flow-based monitoring result):
```
dolphin_flow_tcp_send_byte{app="nginx",dstip="192.168.0.89",dstport="80",ipfamily="ipv4",pod="default/nginx-b74766f5f-7582p",srcip="192.168.1.67",srcport="12973",task="kube-system/example-task"} 1725 1700538280914
```
In the preceding example, the pod named nginx-b74766f5f-7582p is in the default namespace with the label key app and value nginx. This metric is generated by the monitoring task example-task, showing that 1725 IPv4 TCP bytes were sent from 192.168.1.67:12973 to 192.168.0.89:80 at timestamp 1700538280914.
Example 11 (pod-based monitoring result):
```
dolphin_tcp_send_pkt{app="nginx",ipfamily="ipv4",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 14
dolphin_tcp_send_pkt{app="nginx",ipfamily="ipv6",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 0
```
In the preceding example, the pod named nginx-b74766f5f-7582p is in the default namespace with the label key app and value nginx. This metric is generated by the monitoring task example-task, showing that 14 IPv4 packets and zero IPv6 packets were sent by this pod.

If the container does not contain the specified label, the label value in the response body is not found. The format is as follows:

dolphin_ip4_send_byte_internet{test="not found", pod="default/nginx-66c9c65dbf-zjg24",task="default" } 23618

CCE Network Metrics Exporter