prometheus

Introduction

Prometheus is an open-source system monitoring and alerting framework. It is derived from Google's borgmon monitoring system, which was created by former Google employees working at SoundCloud in 2012. Prometheus was developed as an open-source community project and officially released in 2015. In 2016, Prometheus officially joined the Cloud Native Computing Foundation, after Kubernetes.

CCE allows you to quickly install Prometheus as an add-on.

Official website of Prometheus: https://prometheus.io/

Open source community: https://github.com/prometheus/prometheus

Features

As a next-generation monitoring framework, Prometheus has the following features:

Powerful multi-dimensional data model
1. Time series data is identified by metric name and key-value pair.
2. Multi-dimensional labels can be set for all metrics.
3. Data models do not require dot-separated character strings.
4. Data models can be aggregated, cut, and sliced.
5. The double floating-point format is supported. Labels can all be set to unicode.

Flexible and powerful query statement (PromQL): One query statement supports addition, multiplication, and connection for multiple metrics.
Easy to manage: The Prometheus server is a separate binary file that can work locally. It does not depend on distributed storage.
Efficient: Each sampling point occupies only 3.5 bytes, and one Prometheus server can process millions of metrics.
The pull mode is used to collect time series data, which facilitates local tests and prevents faulty servers from pushing bad metrics.
Time series data can be pushed to the Prometheus server in push gateway mode.
Users can obtain the monitored targets through service discovery or static configuration.
Multiple visual GUIs are available.
Easy to scale

As collected data may be lost, Prometheus is not applicable if there is a high requirement on accuracy of the collected data. However, Prometheus has great query advantages if it is used to record time series data. In addition, Prometheus is applicable to the microservice architecture.

Notes and Constraints

This add-on can be installed only in CCE clusters of v1.11 or later.

Installing the Add-on

Log in to the CCE console. In the navigation pane, choose Add-ons. On the Add-on Marketplace tab page, click Install Add-on under prometheus.
On the Install Add-on page, select the cluster and the add-on version, and click Next: Configuration.

In the Configuration step, set the following parameters:

**Table 1** prometheus add-on parameters
Parameter	Description
Add-on Specifications	Select add-on specifications based on service requirements. The options are as follows: Demo(<= 100 containers): The specification type is applicable to the experience and function demonstration environment. In this specification, Prometheus occupies few resources but has limited processing capabilities. You are advised to use this specification when the number of containers in the cluster does not exceed 100. Small(<= 2000 containers): You are advised to use this specification when the number of containers in the cluster does not exceed 2,000. Medium(<= 5000 containers): You are advised to use this specification when the number of containers in the cluster does not exceed 5,000. Large(> 5000 containers): You are advised to use this specification when the number of containers in the cluster exceeds 5,000.
Instances	Number of pods that will be created to match the selected add-on specifications. The number cannot be modified.
Container	CPU and memory quotas of the container allowed for the selected add-on specifications. The quotas cannot be modified.
Remote Write	Select a value. Local: Data collected by the prometheus add-on is stored only in local data disks. CIE: Data collected by the prometheus add-on is stored in both local data disks and CIE. Custom: Data collected by the prometheus add-on is stored in both local data disks and a custom remote end. The remote end address and HTTPS authentication information need to be obtained from third-party services.
Monitoring Data Retention Period	Number of days for storing customized monitoring data. The default value is 15 days.
Storage	Set the following parameters as prompted: Type: EVS is supported. AZ: Set this parameter based on the site requirements. An AZ is a physical region where resources use independent power supply and networks. AZs are physically isolated but interconnected through an internal network. Disk Type: Common I/O, high I/O, and ultra-high I/O are supported. For details about the comparison among these disk types, see System Disks and Data Disks. Capacity: Enter the storage capacity based on service requirements. The default value is 10 GB. NOTE: If a PVC already exists in the namespace monitoring, the configured storage will be used as the storage source.

Click Install.

After the add-on is installed, click Go Back to Previous Page. On the Add-on Instance tab page, select the corresponding cluster to view the running instance. This indicates that the add-on has been installed on each node in the cluster.
In the navigation pane on the left, choose Add-ons. On the Add-on Instance tab page, click prometheus to view details about the add-on instance.

Providing Resource Metrics

Resource metrics of containers and nodes, such as CPU and memory usage, can be obtained through the Kubernetes Metrics API. Resource metrics can be directly accessed, for example, by using the kubectl top command, or used by HPA or CustomedHPA policies for auto scaling.

The prometheus add-on can provide the Kubernetes Metrics API that is disabled by default. To enable the API, create the following APIService object:

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    app: custom-metrics-apiserver
    release: cceaddon-prometheus
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: custom-metrics-apiserver
    namespace: monitoring
    port: 443
  version: v1beta1
  versionPriority: 100

You can save the object as a file, name it metrics-apiservice.yaml, and run the following command:

kubectl create -f metrics-apiservice.yaml

Run the kubectl top command. If the following information is displayed, the Metrics API can be accessed:

# kubectl top pod -n monitoring
NAME                                                      CPU(cores)   MEMORY(bytes)
cceaddon-prometheus-kube-state-metrics-7b77694f48-zc9pl   4m           16Mi
cceaddon-prometheus-node-exporter-4jvwv                   1m           16Mi
cceaddon-prometheus-node-exporter-85zl4                   2m           39Mi
cceaddon-prometheus-node-exporter-qbrmb                   0m           15Mi
cceaddon-prometheus-operator-659547567d-j6484             0m           48Mi
custom-metrics-apiserver-d4f556ff9-l2j2m                  38m          44Mi
grafana-78f9966c99-xprkx                                  0m           25Mi
prometheus-0                                              18m          706Mi

Upgrading the Add-on

Log in to the CCE console. In the navigation pane, choose Add-ons. On the Add-on Instance tab page, click Upgrade under prometheus.
- If the Upgrade button is not available, the current add-on is already up-to-date and no upgrade is required.
- During the upgrade, the prometheus add-on of the original version on cluster nodes will be discarded, and the add-on of the target version will be installed.
On the Basic Information page, select the add-on version and click Next.
Set the parameters by referring to the parameter description in Installing the Add-on and click Upgrade.

Uninstalling the Add-on

Log in to the CCE console. In the navigation pane, choose Add-ons. On the Add-on Instance tab page, click Uninstall under prometheus.
In the dialog box displayed, click Yes to uninstall the add-on.

Reference

For details about the Prometheus concepts and configurations, see the Prometheus Official Documentation.
For details about how to install Node Exporter, see the node_exporter GitHub.
For details about how to send Slack messages, see Incoming Webhooks.

Parent Topic: Add-ons

Previous topic: cce-hpa-controller

Next topic: gpu-beta