使用PrometheusRules配置普罗监控与告警规则

Prometheus具有PrometheusRule的能力，PrometheusRules提供了一种用于监控和警报的规则语言，能够方便用户更好的使用Prometheus查询监控指标，配置基于PromQL的告警规则。

当前云原生监控插件仅支持开启本地数据存储时，提供PrometheusRules配置的能力。

如何配置PrometheusRules

Prometheus提供了PrometheusRules的用于创建用户自己的record来查询指标。

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: demo
  namespace: monitoring
  labels:
    role: operator-prometheus   # 保持一致，必须配置，prometheus配置了该ruleSelector
spec: 
  groups: 
  - name:  demo
    interval: 15s
    rules:
    - record: cpu_request
      expr:   kube_pod_container_resource_requests{resource="cpu",unit="core"}
    - record: cpu_limit
      expr:   kube_pod_container_resource_limits{resource="cpu",unit="core"}
    - record: memory_request
      expr:   kube_pod_container_resource_requests{resource="memory",unit="byte"}
    - record: memory_limit
      expr:   kube_pod_container_resource_limits{resource="memory",unit="byte"}

创建成功后，可以访问Prometheus的Web页面，在“Status > Rules”页面中找到配置的PrometheusRules。

如何通过PrometheusRules配置告警规则

通过配置PrometheusRules的CR资源来创建普罗的告警规则。以集群CPU使用率告警为例创建告警配置模板，可以参考：https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

创建示例的告警规则模板。

kubectl apply -f PrometheusRule.yaml

PrometheusRule.yaml文件内容如下：

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    role: operator-prometheus # 保持一致，必须配置，prometheus配置了该ruleSelector
  name: demo
  namespace: monitoring
spec:
  groups:
  - name: alert-cluster-demo
    rules:
    - alert: 集群CPU使用率超过50%
      expr: 100 - (avg  (irate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) >=50
      for: 2m
      labels:
        severity: critical
        cce_alert_kind: resources
        alertname:  集群CPU使用率超过50%
        kind: resources
        resource_kind: Cluster
        resourceType: Cluster
        source: prometheus
      annotations:
        info: "集群CPU实际使用率超过50%, 集群当前CPU使用率为{{ printf \"%.2f\" $value }}%"
        description: "集群CPU实际使用率超过50%, 集群当前CPU使用率为{{ printf \"%.2f\" $value }}%"

配置成功后，可以访问Prometheus的Web页面，在“Alert”页面查询告警规则是否触发或者生效。
Prometheus插件将自动推送告警至Alertmanager，如果想配置告警的接收方，可以通过配置monitoring命名空间下名称为alertmanager的密钥来配置告警接收方。详细配置可参考alertmanager的文档：https://prometheus.io/docs/alerting/latest/configuration/。

查看alertmanager-alertmanager有状态负载的yaml可以看到告警数据存放在Pod磁盘中，如果Pod重启，告警数据就会消失。如需要持久化，请规划一个PVC，并修改alertmanager的CR资源，挂载PVC。