Updated on 2024-01-04 GMT+08:00

CCE Network Metrics Exporter

Introduction

Dolphin is an add-on for monitoring and managing container network traffic. The current version of dolphin can collect traffic statistics of containers that do not use the host network mode in CCE Turbo clusters and performs nodewide container connectivity check.

You can use podSelector to select the monitoring backend. Multiple monitoring tasks and optional monitoring metrics are supported. You can also obtain the label information of pods. The monitoring information has been adapted to the Prometheus format. You can call the Prometheus API to view monitoring data.

Constraints

  • This add-on can be installed only in CCE Turbo clusters of version 1.19 or later and deployed only on x86 nodes running EulerOS.
  • This add-on can be installed on nodes that use the containerd or Docker container engine. In containerd nodes, it can trace pod updates in real time. In Docker nodes, it can query pod updates in polling mode.
  • Only traffic statistics of secure containers (Kata as the container runtime) and common containers (runC as the container runtime) in a CCE Turbo cluster can be collected.
  • After the add-on is installed, traffic is by default not monitored. You need to create a MonitorPolicy to configure a monitoring task for traffic monitoring.
  • Pods using the host network mode cannot be monitored.
  • Ensure that there are sufficient resources on a node for installing the add-on.
  • The source of monitoring labels and user labels must be already available before a pod is created.
  • You can specify a maximum of five labels. You cannot specify the labels used by the system. Labels used by the system include pod, task, ipfamily, srcip, dstip, srcport, dstport, and protocol.

Installing the Add-on

  1. Log in to the CCE console and click the CCE Turbo cluster name to access the cluster. Click Add-ons in the navigation pane, locate CCE Network Metrics Exporter on the right, and click Install.
  2. On the Install Add-on page, view the add-on configuration.

    No parameter can be configured for the current add-on.

  3. Click Install.

    After the add-on is installed, select the cluster and click Add-ons in the navigation pane. On the displayed page, view the add-on in the Add-ons Installed area.

Components

Table 1 dolphin component

Component

Description

Resource Type

dolphin

Used to monitor the container network traffic of CCE Turbo clusters

DaemonSet

Monitoring Metrics of dolphin

You can deliver a monitoring task by creating a MonitorPolicy. A MonitorPolicy can be created by calling an API or using the kubectl apply command after logging in to a worker node. A MonitorPolicy represents a monitoring task and provides optional parameters such as selector and podLabel. The following table describes the supported monitoring metrics.

Table 2 Supported monitoring metrics

Monitoring Metric

Monitoring Item

Granularity

Supported Runtime

Supported Cluster Version

Supported Add-on Version

Supported OS

Number of IPv4 packets sent to the Internet

dolphin_ip4_send_pkt_internet

Pod

runC/Kata

v1.19 or later

1.1.2

EulerOS 2.9

EulerOS 2.10

Number of IPv4 bytes sent to the Internet

dolphin_ip4_send_byte_internet

Pod

runC/Kata

v1.19 or later

1.1.2

EulerOS 2.9

EulerOS 2.10

Number of received IPv4 packets

dolphin_ip4_rcv_pkt

Pod

runC/Kata

v1.19 or later

1.1.2

EulerOS 2.9

EulerOS 2.10

Number of received IPv4 bytes

dolphin_ip4_rcv_byte

Pod

runC/Kata

v1.19 or later

1.1.2

EulerOS 2.9

EulerOS 2.10

Number of sent IPv4 packets

dolphin_ip4_send_pkt

Pod

runC/Kata

v1.19 or later

1.1.2

EulerOS 2.9

EulerOS 2.10

Number of sent IPv4 bytes

dolphin_ip4_send_byte

Pod

runC/Kata

v1.19 or later

1.1.2

EulerOS 2.9

EulerOS 2.10

Health status of the latest health check

dolphin_health_check_status

Pod

runC/Kata

v1.19 or later

1.2.2

EulerOS 2.9

EulerOS 2.10

Total number of successful health checks

dolphin_health_check_successful_counter

Pod

runC/Kata

v1.19 or later

1.2.2

EulerOS 2.9

EulerOS 2.10

Total number of failed health checks

dolphin_health_check_failed_counter

Pod

runC/Kata

v1.19 or later

1.2.2

EulerOS 2.9

EulerOS 2.10

Delivering a Monitoring Task

The template for creating a MonitorPolicy is as follows:

apiVersion: crd.dolphin.io/v1
kind: MonitorPolicy
metadata:
    name: example-task            # Monitoring task name.
    namespace: kube-system        # The value must be kube-system. This field is mandatory.
spec:
    selector:                     # (Optional) Backend monitored by the dolphin add-on, for example, labelSelector. By default, all containers on the node are monitored.
    matchLabels:
      app: nginx
    matchExpressions:
      - key: app
        operator: In
        values:
          - nginx
  podLabel: [app]               # (Optional) Pod label.
  ip4Tx:                        # (Optional) Indicates whether to collect statistics about the number of sent IPv4 packets and the number of sent IPv4 bytes. This function is disabled by default.
    enable: true
    ip4Rx:                        # (Optional) Indicates whether to collect statistics about the number of received IPv4 packets and the number of received IPv4 bytes. This function is disabled by default.
    enable: true
    ip4TxInternet:                # (Optional) Indicates whether to collect statistics about the number of sent IPv4 packets and the number of sent IPv4  bytes. This function is disabled by default.
    enable: true
  healthCheck:                  # (Optional) Whether to collect statistics about whether the latest health check result is healthy and the total number of healthy times and unhealthy times in the pod health checks of the local node. This function is disabled by default.
    enable: true                # true false
    failureThreshold: 3         # (Optional) Number of failures that determine the health check is unhealthy. One check failure is considered as unhealthy by default.
    periodSeconds: 5            # (Optional) Interval between health checks, in seconds. The default value is 60.
    command: ""                 # (Optional) Health check command. The value can be ping (default), arping, or curl.
    ipFamilies: [""]            # (Optional) Health check IP address family. The value is IPv4 by default.
    port: 80                    # (Optional) Port number, which is mandatory when curl is used.
    path: ""                    # (Optional) HTTP API path, which is mandatory when curl is used.

PodLabel: You can enter the labels of multiple pods and separate them with commas (,), for example, [app, version].

Labels must comply with the following rules. The corresponding regular expression is (^[a-zA-Z_]$)|(^([a-zA-Z][a-zA-Z0-9_]|_[a-zA-Z0-9])([a-zA-Z0-9_]){0,254}$).

  • A maximum of five labels can be entered. A label can contain a maximum of 256 characters.
  • The value cannot start with a digit or double underscores (_).
  • The format of a single label must comply with A-Za-z_0-9.
You can create, modify, and delete monitoring tasks in the preceding format. A maximum of 10 monitoring tasks can be created. When multiple monitoring tasks match the same monitoring backend, each monitoring backend generates the monitoring metrics specific to the number of monitoring tasks.
  • If you modify or delete a monitoring task, monitoring data collected by the monitoring task will be lost. Therefore, exercise caution when performing this operation.
  • If the add-on is uninstalled, the MonitorPolicy of the monitoring task will be removed together with the add-on.

Example application scenarios:

  1. The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates the three health check metrics. By default, the ping command is used to detect local pods. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
    apiVersion: crd.dolphin.io/v1
    kind: MonitorPolicy
    metadata:
      name: example-task  
      namespace: kube-system        
    spec:
      selector:
        matchLabels:
          app: nginx
      podLabel: [test, app] 
      healthCheck: 
        enable: true
        failureThreshold: 3
        periodSeconds: 5
  2. The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates the three health check metrics. Customized curl command is used, which considers only the network connectivity. That is, no matter what the HTTP code is returned by the program, the pod is considered healthy as long as the network is connected. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
    apiVersion: crd.dolphin.io/v1
    kind: MonitorPolicy
    metadata:
      name: example-task
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          app: nginx
      podLabel: [test, app]
      healthCheck: 
        enable: true
        failureThreshold: 3
        periodSeconds: 5 
        command: "curl"
        port: 80
        path: "healthz"
  3. The example below monitors all pods on a node and generates the number of sent IPv4 packets and the number of sent IPv4 bytes. If the monitored container contains the app label, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
    apiVersion: crd.dolphin.io/v1
    kind: MonitorPolicy
    metadata:
      name: example-task  
      namespace: kube-system        
    spec:
      podLabel: [app]
      ip4Tx:
        enable: true
  4. The example below monitors all pods with label app=nginx selected by the labelselector on a node and generates the number of sent IPv4 packets, received IPv4 packets, sent IPv4 bytes, received IPv4 bytes, IPv4 packets sent to the public network, and IPv4 bytes sent to the public network. If the monitored container contains the test and app labels, the key-value information of the corresponding label is carried in the monitoring metrics. Otherwise, the value of the corresponding label is not found.
    apiVersion: crd.dolphin.io/v1
    kind: MonitorPolicy
    metadata:
      name: example-task
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          app: nginx
      podLabel: [test, app]
      ip4Tx:
        enable: true
      ip4Rx:
        enable: true
      ip4TxInternet:      
        enable: true

Checking Traffic Statistics

The monitoring data collected by this add-on is exported in Prometheus exporter format, which can be obtained in either of the following ways:

  • Directly access service port 10001 provided by the dolphin add-on, for example, http://{POD_IP}:10001/metrics.

    Note that if you access the dolphin service port on a node, allow access from the security group of the node and pod.

Examples of the monitored information:

  • Example 1 (number of IPv4 packets sent to the Internet):
    dolphin_ip4_send_pkt_internet{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 241

    In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 packets sent by the pod to the public network is 241.

  • Example 2 (number of IPv4 bytes sent to the Internet):
    dolphin_ip4_send_byte_internet{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task" } 23618

    In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 bytes sent by the pod to the public network is 23618.

  • Example 3 (number of sent IPv4 packets):
    dolphin_ip4_send_pkt{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 379

    In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 packets sent by the pod is 379.

  • Example 4 (number of sent IPv4 bytes):
    dolphin_ip4_send_byte{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 33129

    In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 bytes sent by the pod is 33129.

  • Example 5 (number of received IPv4 packets):
    dolphin_ip4_rcv_pkt{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 464

    In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 packets received by the pod is 464.

  • Example 6 (number of received IPv4 bytes):
    dolphin_ip4_rcv_byte{app="nginx",pod="default/nginx-66c9c65dbf-zjg24",task="kube-system/example-task "} 34654

    In the preceding example, the namespace of the pod is default, the pod name is nginx-66c9c65dbf-zjg24, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of IPv4 bytes received by the pod is 34654.

  • Example 7 (health check status)
    dolphin_health_check_status{app="nginx",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 0

    In the preceding example, the namespace of the pod is kube-system, the pod name is default/nginx-deployment-b74766f5f-7582p, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the network health status of the pod is 0 (healthy). If the network status is unhealthy, the value will be 1.

  • Example 8 (number of successful health checks)
    dolphin_health_check_successful_counter{app="nginx",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 5

    In the preceding example, the namespace of the pod is kube-system, the pod name is default/nginx-deployment-b74766f5f-7582p, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of successful network health checks for the pod is 5.

  • Example 9 (number of failed health check failures)
    dolphin_health_check_failed_counter{app="nginx",pod="default/nginx-b74766f5f-7582p",task="kube-system/example-task"} 0

    In the preceding example, the namespace of the pod is kube-system, the pod name is default/nginx-deployment-b74766f5f-7582p, the label is app, and the value is nginx. This metric is created by monitoring task example-task, and the number of failed network health checks for the pod is 0.

If the container does not contain the specified label, the label value in the response body is not found. The format is as follows:

dolphin_ip4_send_byte_internet{test="not found", pod="default/nginx-66c9c65dbf-zjg24",task="default" } 23618