Help Center> Cloud Container Engine> Best Practices> Auto Scaling> Auto Scaling Based on ELB Monitoring Metrics

Auto Scaling Based on ELB Monitoring Metrics

Issues

In Using HPA and CA for Auto Scaling of Workloads and Nodes, auto scaling is performed based on the usage of resources such as CPU and memory.

However, resource usage usually lags. Such scaling cannot perfectly support services such as flash sales and social media that require quick and elastic scaling.

Solution

This section describes an auto scaling solution based on ELB monitoring metrics. Compared with CPU/memory usage-based auto scaling, auto scaling based on ELB QPS data is more targeted and timely.

The key of this solution is to obtain the ELB metric data and report the data to Prometheus, convert the data in Prometheus to the metric data that can be identified by HPA, and then perform auto scaling based on the converted data.

The implementation scheme is as follows:

  1. Develop a Prometheus exporter to obtain ELB metric data, convert the data into the format required by Prometheus, and report it to Prometheus.
  2. Convert the Prometheus data into the Kubernetes metric API for the HPA controller to use.
  3. Set an HPA rule to use ELB monitoring data as auto scaling metrics.
Figure 1 ELB traffic flows and monitoring data

Other metrics can be collected in the similar way.

Prerequisites

  • You must be familiar with Prometheus and be able to write the Prometheus exporter.
  • The prometheus and metrics-server add-ons must be installed in the cluster.

Writing the Exporter

Prometheus periodically calls the /metrics API of the exporter to obtain metric data. Applications only need to report monitoring data through /metrics. You can select a Prometheus client in a desired language and integrate it into applications to implement the /metrics API. For details about the client, see Prometheus CLIENT LIBRARIES. For details about how to write the exporter, see WRITING EXPORTERS.

The monitoring data must be in the format that Prometheus supports. Each data record provides the ELB ID, listener ID, namespace where the Service is located, Service name, and Service UID as labels, as shown in the following figure.

To obtain the preceding data, perform the following steps:

  1. Use the API for listing Services to query all Services.

    The annotations field in the returned information contains the ELB associated with the Service.

    • kubernetes.io/elb.id
    • kubernetes.io/elb.class

  2. Use the listener query API to query the listener ID based on the ELB instance ID obtained in the previous step.
  3. Obtain the ELB monitoring data.

    The ELB monitoring data is queried using the CES API used to query monitoring data in batches. For details about ELB monitoring metrics, see Monitoring Metrics. Example:

    • m1_cps: number of concurrent connections
    • m5_in_pps: number of incoming data packets
    • m6_out_pps: number of outgoing data packets
    • m7_in_Bps: incoming rate
    • m8_out_Bps: outgoing rate

  4. Aggregate data in the format that Prometheus supports and expose the data through the /metrics API.

Deploying the Exporter

Prometheus can dynamically monitor pods if you add Prometheus annotations to the pods.

Some commonly used Prometheus annotations are as follows:

  • prometheus.io/scrape: If the value is true, the pod will be monitored.
  • prometheus.io/path: URL from which the data is collected. The default value is /metrics.
  • prometheus.io/port: port number of the endpoint to collect data from.
  • prometheus.io/scheme: Defaults to http. If HTTPS is configured for security purposes, change the value to https.

Generally, you only need to add prometheus.io/scrape so that Prometheus can collect pod monitoring information via /metrics.

apiVersion: extensions/v1
kind: Deployment
metadata:
  labels:
    app: exporter
  name: exporter
  namespace: default
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
      creationTimestamp: null
      labels:
        app: exporter
    spec:
      ...

Converting Prometheus Data

After collecting monitoring data, Prometheus needs to convert the data into the Kubernetes metric API for the HPA controller to perform auto scaling.

The open-source community provides a Prometheus adapter to convert the data. You can obtain it from here or install it as instructed in its documentation.

You need to configure the Prometheus adapter properly as it determines what and how Prometheus data needs to be exposed.

Each adapter policy contains four parts:

  1. Metrics to be exposed
  2. Relationship between these metrics and Kubernetes resources
  3. Names of the metrics exposed in the custom API. You do not need to use the metric names exposed in Prometheus.
  4. Approach to query the data in Prometheus

For example:

    - metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
      resources:
        overrides:
          namespace:
            resource: namespace
          service_name:
            resource: service
      seriesQuery: '{app="exporter"}'

In this example, the data of the exporter is associated with the Service. You can obtain the Service name from the service_name label of the metric.

Creating an HPA Policy

After the data reported by the exporter to Prometheus is converted into the Kubernetes metric API by using the Prometheus adapter, you can create an HPA policy for auto scaling.

The following is an example HPA policy. The inbound traffic of the ELB load balancer is used to trigger scale-out. When the value of m7_in_Bps (inbound traffic rate) exceeds 10000, the nginx Deployment will be scaled.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Object
    object: 
      describedObject:
        kind: Service
        name: nginx
      target:
        type: Value
        value: 10k
      metric:
        name: m7_in_Bps

After the workload is created, you can perform a pressure test on the workload (accessing the pods through ELB). Then, the HPA controller determines whether scaling is required based on the configured value.