Updated on 2024-04-11 GMT+08:00

Basic Metrics: Container Metrics

This section describes the types, names, and meanings of metrics reported to AOM from CCE's kube-prometheus-stack add-on or on-premises Kubernetes clusters.

Table 1 Metrics of containers running in CCE or on-premises Kubernetes clusters

Target Name

Job Name

Metric

Description

  • serviceMonitor/monitoring/coredns/0
  • serviceMonitor/monitoring/node-local-dns/0

coredns and node-local-dns

coredns_build_info

Information to build CoreDNS

coredns_cache_entries

Number of entries in the cache

coredns_cache_size

Cache size

coredns_cache_hits_total

Number of cache hits

coredns_cache_misses_total

Number of cache misses

coredns_cache_requests_total

Total number of DNS resolution requests in different dimensions

coredns_dns_request_duration_seconds_bucket

Histogram of DNS request duration (bucket)

coredns_dns_request_duration_seconds_count

Histogram of DNS request duration (count)

coredns_dns_request_duration_seconds_sum

Histogram of DNS request duration (sum)

coredns_dns_request_size_bytes_bucket

Histogram of the size of DNS request (bucket)

coredns_dns_request_size_bytes_count

Histogram of the size of DNS request (count)

coredns_dns_request_size_bytes_sum

Histogram of the size of DNS request (sum)

coredns_dns_requests_total

Number of DNS requests

coredns_dns_response_size_bytes_bucket

Histogram of the size of DNS response (bucket)

coredns_dns_response_size_bytes_count

Histogram of the size of DNS response (count)

coredns_dns_response_size_bytes_sum

Histogram of the size of DNS response (sum)

coredns_dns_responses_total

DNS response codes and number of DNS response codes

coredns_forward_conn_cache_hits_total

Number of cache hits for each protocol and data flow

coredns_forward_conn_cache_misses_total

Number of cache misses for each protocol and data flow

coredns_forward_healthcheck_broken_total

Unhealthy upstream count

coredns_forward_healthcheck_failures_total

Count of failed health checks per upstream

coredns_forward_max_concurrent_rejects_total

Number of requests rejected due to excessive concurrent requests

coredns_forward_request_duration_seconds_bucket

Histogram of forward request duration (bucket)

coredns_forward_request_duration_seconds_count

Histogram of forward request duration (count)

coredns_forward_request_duration_seconds_sum

Histogram of forward request duration (sum)

coredns_forward_requests_total

Number of requests for each data flow

coredns_forward_responses_total

Number of responses to each data flow

coredns_health_request_duration_seconds_bucket

Histogram of health request duration (bucket)

coredns_health_request_duration_seconds_count

Histogram of health request duration (count)

coredns_health_request_duration_seconds_sum

Histogram of health request duration (sum)

coredns_health_request_failures_total

Number of health request failures

coredns_hosts_reload_timestamp_seconds

Timestamp of the last reload of the host file

coredns_kubernetes_dns_programming_duration_seconds_bucket

Histogram of DNS programming duration (bucket)

coredns_kubernetes_dns_programming_duration_seconds_count

Histogram of DNS programming duration (count)

coredns_kubernetes_dns_programming_duration_seconds_sum

Histogram of DNS programming duration (sum)

coredns_local_localhost_requests_total

Number of localhost requests

coredns_nodecache_setup_errors_total

Number of nodecache setup errors

coredns_dns_response_rcode_count_total

Number of responses for each Zone and Rcode

coredns_dns_request_count_total

Number of DNS requests

coredns_dns_request_do_count_total

Number of requests with the DNSSEC OK (DO) bit set

coredns_dns_do_requests_total

Number of requests with the DO bit set

coredns_dns_request_type_count_total

Number of requests for each Zone and Type

coredns_panics_total

Total number of panics

coredns_plugin_enabled

Whether a plugin is enabled

coredns_reload_failed_total

Number of last reload failures

serviceMonitor/monitoring/kube-apiserver/0

apiserver

aggregator_unavailable_apiservice

Number of unavailable APIServices

apiserver_admission_controller_admission_duration_seconds_bucket

Processing delay of an Admission Controller

apiserver_admission_webhook_admission_duration_seconds_bucket

Processing delay of an Admission Webhook

apiserver_admission_webhook_admission_duration_seconds_count

Number of Admission Webhook processing requests

apiserver_client_certificate_expiration_seconds_bucket

Remaining validity period of the client certificate

apiserver_client_certificate_expiration_seconds_count

Remaining validity period of the client certificate

apiserver_current_inflight_requests

Number of read requests in process

apiserver_request_duration_seconds_bucket

Delay of the client's access to the APIServer

apiserver_request_total

Number of different requests to the APIServer

go_goroutines

Number of goroutines

kubernetes_build_info

Information to build Kubernetes

process_cpu_seconds_total

Total process CPU time

process_resident_memory_bytes

Size of the resident memory set for a process

rest_client_requests_total

Number of REST requests

workqueue_adds_total

Number of adds handled by a work queue

workqueue_depth

Depth of a work queue

workqueue_queue_duration_seconds_bucket

Duration when a task exists in the work queue

aggregator_unavailable_apiservice_total

Number of unavailable APIServices

rest_client_request_duration_seconds_bucket

Histogram of REST request duration

serviceMonitor/monitoring/kubelet/0

kubelet

kubelet_certificate_manager_client_expiration_renew_errors

Number of certificate renewal errors

kubelet_certificate_manager_client_ttl_seconds

Time-to-live (TTL) of the Kubelet client certificate

kubelet_cgroup_manager_duration_seconds_bucket

Duration of the cgroup manager operations (bucket)

kubelet_cgroup_manager_duration_seconds_count

Duration of the cgroup manager operations (count)

kubelet_node_config_error

If a configuration-related error occurs on a node, the value of this metric is true (1). If there is no configuration-related error, the value is false (0).

kubelet_node_name

Node name. The value is always 1.

kubelet_pleg_relist_duration_seconds_bucket

Duration of relisting pods in PLEG (bucket)

kubelet_pleg_relist_duration_seconds_count

Duration of relisting pods in PLEG (count)

kubelet_pleg_relist_interval_seconds_bucket

Interval between relisting operations in PLEG (bucket)

kubelet_pod_start_duration_seconds_count

Time required for starting a single pod (count)

kubelet_pod_start_duration_seconds_bucket

Time required for starting a single pod (bucket)

kubelet_pod_worker_duration_seconds_bucket

Duration for synchronizing a single pod. Operation type: create, update, or sync

kubelet_running_containers

Number of running containers

kubelet_running_pods

Number of running pods

kubelet_runtime_operations_duration_seconds_bucket

Duration of the runtime operations (bucket)

kubelet_runtime_operations_errors_total

Number of runtime operation errors listed by operation type

kubelet_runtime_operations_total

Number of runtime operations listed by operation type

kubelet_volume_stats_available_bytes

Number of available bytes in a volume

kubelet_volume_stats_capacity_bytes

Capacity of the volume in bytes

kubelet_volume_stats_inodes

Total number of inodes in a volume

kubelet_volume_stats_inodes_used

Number of used inodes in a volume

kubelet_volume_stats_used_bytes

Number of used bytes in a volume

storage_operation_duration_seconds_bucket

Duration of each storage operation (bucket)

storage_operation_duration_seconds_count

Duration of each storage operation (count)

storage_operation_errors_total

Number of storage operation errors

volume_manager_total_volumes

Number of volumes in the Volume Manager

rest_client_requests_total

Number of HTTP client requests partitioned by status code, method, and host

rest_client_request_duration_seconds_bucket

Request delay (bucket)

process_resident_memory_bytes

Size of the resident memory set for a process

process_cpu_seconds_total

Total process CPU time

go_goroutines

Number of goroutines

serviceMonitor/monitoring/kubelet/1

kubelet

container_cpu_cfs_periods_total

Number of elapsed enforcement period intervals

container_cpu_cfs_throttled_periods_total

Number of throttled period intervals

container_cpu_cfs_throttled_seconds_total

Total time duration the container has been throttled

container_cpu_load_average_10s

Value of container CPU load average over the last 10 seconds

container_cpu_usage_seconds_total

Cumulative CPU time consumed by a container in core-seconds

container_file_descriptors

Number of open file descriptors for a container

container_fs_inodes_free

Number of available inodes in a file system

container_fs_inodes_total

Number of inodes in a file system

container_fs_io_time_seconds_total

Cumulative seconds spent on doing I/Os by the disk or file system

container_fs_limit_bytes

Total disk or file system capacity that can be consumed by a container

container_fs_read_seconds_total

Cumulative number of seconds the container spent on reading disk or file system data

container_fs_reads_bytes_total

Cumulative amount of disk or file system data read by a container

container_fs_reads_total

Cumulative number of disk or file system reads completed by a container

container_fs_usage_bytes

File system usage

container_fs_write_seconds_total

Cumulative number of seconds the container spent on writing data to the disk or file system

container_fs_writes_bytes_total

Total amount of data written by a container to a disk or file system

container_fs_writes_total

Cumulative number of disk or file system writes completed by a container

container_memory_cache

Memory used for the page cache of a container

container_memory_failcnt

Number of memory usage hits limits

container_memory_max_usage_bytes

Maximum memory usage recorded for a container

container_memory_rss

Size of the resident memory set for a container

container_memory_swap

Container swap usage

container_memory_usage_bytes

Current memory usage of a container

container_memory_working_set_bytes

Memory usage of the working set of a container

container_network_receive_bytes_total

Total volume of data received by the container network

container_network_receive_errors_total

Cumulative number of errors encountered during reception

container_network_receive_packets_dropped_total

Cumulative number of packets dropped during reception

container_network_receive_packets_total

Cumulative number of packets received

container_network_transmit_bytes_total

Total volume of data transmitted on the container network

container_network_transmit_errors_total

Cumulative number of errors encountered during transmission

container_network_transmit_packets_dropped_total

Cumulative number of packets dropped during transmission

container_network_transmit_packets_total

Cumulative number of packets transmitted

container_spec_cpu_quota

CPU quota of the container

container_spec_memory_limit_bytes

Memory limit for the container

machine_cpu_cores

Number of logical CPU cores

machine_memory_bytes

Amount of memory

serviceMonitor/monitoring/kube-state-metrics/0

kube-state-metrics-prom

kube_cronjob_status_active

Running cronjob

kube_cronjob_info

Cronjob information

kube_cronjob_labels

Label of a cronjob

kube_configmap_info

ConfigMap information

kube_daemonset_created

DaemonSet creation time

kube_daemonset_status_current_number_scheduled

Number of DaemonSets that are being scheduled

kube_daemonset_status_desired_number_scheduled

Number of DaemonSets expected to be scheduled

kube_daemonset_status_number_available

Number of nodes that should be running a DaemonSet pod and have at least one DaemonSet pod running and available

kube_daemonset_status_number_misscheduled

Number of nodes that are not expected to run a DaemonSet pod

kube_daemonset_status_number_ready

Number of nodes that should be running the DaemonSet pods and have one or more DaemonSet pods running and ready

kube_daemonset_status_number_unavailable

Number of nodes that should be running the DaemonSet pods but have none of the DaemonSet pods running and available

kube_daemonset_status_updated_number_scheduled

Number of nodes that are running an updated DaemonSet pod

kube_deployment_created

Deployment creation timestamp

kube_deployment_labels

Deployment labels

kube_deployment_metadata_generation

Sequence number representing a specific generation of the desired state

kube_deployment_spec_replicas

Number of desired replicas for a Deployment

kube_deployment_spec_strategy_rollingupdate_max_unavailable

Maximum number of unavailable replicas during a rolling update of a Deployment

kube_deployment_status_observed_generation

The generation observed by the Deployment controller

kube_deployment_status_replicas

Number of current replicas of a Deployment

kube_deployment_status_replicas_available

Number of available replicas per Deployment

kube_deployment_status_replicas_ready

Number of ready replicas per Deployment

kube_deployment_status_replicas_unavailable

Number of unavailable replicas per Deployment

kube_deployment_status_replicas_updated

Number of updated replicas per Deployment

kube_job_info

Information about the job

kube_namespace_labels

Namespace labels

kube_node_labels

Node labels

kube_node_info

Information about a node

kube_node_spec_taint

Taint of a node

kube_node_spec_unschedulable

Whether new pods can be scheduled to a node

kube_node_status_allocatable

Allocatable resources on a node

kube_node_status_capacity

Capacity for different resources on a node

kube_node_status_condition

Condition of a node

kube_node_volcano_oversubscription_status

Node oversubscription status

kube_persistentvolume_status_phase

Phase of a PV status

kube_persistentvolumeclaim_status_phase

Phase of a PVC status

kube_persistentvolume_info

Information about a PV

kube_persistentvolumeclaim_info

Information about a PVC

kube_pod_container_info

Information about a container running in the pod

kube_pod_container_resource_limits

Number of container resource limits

kube_pod_container_resource_requests

Number of container resource requests

kube_pod_container_status_last_terminated_reason

Last reason the container was in a terminated state

kube_pod_container_status_ready

Whether the container's readiness check succeeded

kube_pod_container_status_restarts_total

Number of container restarts

kube_pod_container_status_running

Whether the container is running.

kube_pod_container_status_terminated

Whether the container is terminated

kube_pod_container_status_terminated_reason

The reason why the container is in a terminated state

kube_pod_container_status_waiting

Whether the container is waiting

kube_pod_container_status_waiting_reason

The reason why the container is in the waiting state

kube_pod_info

Information about a pod

kube_pod_labels

Pod labels

kube_pod_owner

Information about the pod's owner

kube_pod_status_phase

Current phase of a pod

kube_pod_status_ready

Whether the pod is ready

kube_secret_info

Information about a secret

kube_statefulset_created

StatefulSet creation timestamp

kube_statefulset_labels

Information about StatefulSet labels

kube_statefulset_metadata_generation

Sequence number representing a specific generation of the desired state for a StatefulSet

kube_statefulset_replicas

Number of desired pods for a StatefulSet

kube_statefulset_status_observed_generation

The generation observed by the StatefulSet controller

kube_statefulset_status_replicas

Number of replicas per StatefulSet

kube_statefulset_status_replicas_ready

Number of ready replicas per StatefulSet

kube_statefulset_status_replicas_updated

Number of updated replicas per StatefulSet

kube_job_spec_completions

Desired number of successfully finished pods that should run with the job

kube_job_status_failed

Failed jobs

kube_job_status_succeeded

Successful jobs

kube_node_status_allocatable_cpu_cores

Number of allocatable CPU cores of a node

kube_node_status_allocatable_memory_bytes

Total allocatable memory of a node

kube_replicaset_owner

Information about the ReplicaSet's owner

kube_resourcequota

Information about resource quota

kube_pod_spec_volumes_persistentvolumeclaims_info

Information about the PVC associated with the pod

serviceMonitor/monitoring/prometheus-lightweight/0

prometheus-lightweight

vm_persistentqueue_blocks_dropped_total

Number of dropped blocks in a send queue

vm_persistentqueue_blocks_read_total

Number of blocks read by a send queue

vm_persistentqueue_blocks_written_total

Number of blocks written to a send queue

vm_persistentqueue_bytes_pending

Number of pending bytes in a send queue

vm_persistentqueue_bytes_read_total

Number of bytes read by a send queue

vm_persistentqueue_bytes_written_total

Number of bytes written to a send queue

vm_promscrape_active_scrapers

Number of active scrapes

vm_promscrape_conn_read_errors_total

Number of read errors during scrapes

vm_promscrape_conn_write_errors_total

Number of write errors during scrapes

vm_promscrape_max_scrape_size_exceeded_errors_total

Number of failed scrapes due to the exceeded response size

vm_promscrape_scrape_duration_seconds_sum

Duration of scrapes (sum)

vm_promscrape_scrape_duration_seconds_count

Duration of scrapes (count)

vm_promscrape_scrapes_total

Number of scrapes

vmagent_remotewrite_bytes_sent_total

Number of bytes sent via a remote write

vmagent_remotewrite_duration_seconds_sum

Time required for a remote write (sum)

vmagent_remotewrite_duration_seconds_count

Time required for a remote write (count)

vmagent_remotewrite_packets_dropped_total

Number of dropped packets during a remote write

vmagent_remotewrite_pending_data_bytes

Number of pending bytes during a remote write

vmagent_remotewrite_requests_total

Number of requests of the remote write

vmagent_remotewrite_retries_count_total

Number of retries of the remote write

go_goroutines

Number of goroutines

serviceMonitor/monitoring/node-exporter/0

node-exporter

node_boot_time_seconds

Node boot time

node_context_switches_total

Number of context switches

node_cpu_seconds_total

Seconds each CPU spent doing each type of work

node_disk_io_now

Number of I/Os in progress

node_disk_io_time_seconds_total

Total seconds spent doing I/Os

node_disk_io_time_weighted_seconds_total

The weighted number of seconds spent doing I/Os

node_disk_read_bytes_total

Number of bytes that are read

node_disk_read_time_seconds_total

Number of seconds spent by all reads

node_disk_reads_completed_total

Number of reads completed

node_disk_write_time_seconds_total

Number of seconds spent by all writes

node_disk_writes_completed_total

Number of writes completed

node_disk_written_bytes_total

Number of bytes that are written

node_docker_thinpool_data_space_available

Available data space of a docker thin pool

node_docker_thinpool_metadata_space_available

Available metadata space of a docker thin pool

node_exporter_build_info

Node exporter build information

node_filefd_allocated

Allocated file descriptors

node_filefd_maximum

Maximum number of file descriptors

node_filesystem_avail_bytes

File system space that is available for use

node_filesystem_device_error

Whether an error occurred while getting statistics for the given device

node_filesystem_free_bytes

Remaining space of a file system

node_filesystem_readonly

Read-only file system

node_filesystem_size_bytes

Consumed space of a file system

node_forks_total

Number of forks

node_intr_total

Number of interruptions that occurred

node_load1

1-minute average CPU load

node_load15

15-minute average CPU load

node_load5

5-minute average CPU load

node_memory_Buffers_bytes

Memory of the node buffer

node_memory_Cached_bytes

Memory for the node page cache

node_memory_MemAvailable_bytes

Available memory of a node

node_memory_MemFree_bytes

Free memory of a node

node_memory_MemTotal_bytes

Total memory of a node

node_network_receive_bytes_total

Total amount of received data

node_network_receive_drop_total

Cumulative number of packets dropped during reception

node_network_receive_errs_total

Cumulative number of errors encountered during reception

node_network_receive_packets_total

Cumulative number of packets received

node_network_transmit_bytes_total

Total amount of transmitted data

node_network_transmit_drop_total

Cumulative number of dropped packets during transmission

node_network_transmit_errs_total

Cumulative number of errors encountered during transmission

node_network_transmit_packets_total

Cumulative number of packets transmitted

node_procs_blocked

Blocked processes

node_procs_running

Running processes

node_sockstat_sockets_used

Number of sockets in use

node_sockstat_TCP_alloc

Number of allocated TCP sockets

node_sockstat_TCP_inuse

Number of TCP sockets in use

node_sockstat_TCP_orphan

Number of orphaned TCP sockets

node_sockstat_TCP_tw

Number of TCP sockets in the TIME_WAIT state

node_sockstat_UDPLITE_inuse

Number of UDP-Lite sockets in use

node_sockstat_UDP_inuse

Number of UDP sockets in use

node_sockstat_UDP_mem

UDP socket buffer usage

node_timex_offset_seconds

Time offset

node_timex_sync_status

Synchronization status of node clocks

node_uname_info

Labeled system information as provided by the uname system call

node_vmstat_oom_kill

OOM kill in /proc/vmstat

process_cpu_seconds_total

Total process CPU time

process_max_fds

Maximum number of file descriptors of a process

process_open_fds

Opened file descriptors by a process

process_resident_memory_bytes

Size of the resident memory set for a process

process_start_time_seconds

Process start time

process_virtual_memory_bytes

Virtual memory size for a process

process_virtual_memory_max_bytes

Maximum virtual memory size for a process

node_netstat_Tcp_ActiveOpens

Number of TCP connections that directly change from the CLOSED state to the SYN-SENT state

node_netstat_Tcp_PassiveOpens

Number of TCP connections that directly change from the LISTEN state to the SYN-RCVD state

node_netstat_Tcp_CurrEstab

Number of TCP connections in the ESTABLISHED or CLOSE-WAIT state

node_vmstat_pgmajfault

Number of major faults per second in /proc/vmstat

node_vmstat_pgpgout

Number of page out between main memory and block device in /proc/vmstat

node_vmstat_pgfault

Number of page faults the system has made per second in /proc/vmstat

node_vmstat_pgpgin

Number of page in between main memory and block device in /proc/vmstat

node_processes_max_processes

PID limit value

node_processes_pids

Number of PIDs

node_nf_conntrack_entries

Number of currently allocated flow entries for connection tracking

node_nf_conntrack_entries_limit

Maximum size of a connection tracking table

promhttp_metric_handler_requests_in_flight

Number of metrics being processed

go_goroutines

Number of node exporter goroutines

podMonitor/monitoring/nvidia-gpu-device-plugin/0

monitoring/nvidia-gpu-device-plugin

cce_gpu_utilization

GPU compute usage

cce_gpu_memory_utilization

GPU memory usage

cce_gpu_encoder_utilization

GPU encoding usage

cce_gpu_decoder_utilization

GPU decoding usage

cce_gpu_utilization_process

GPU compute usage of each process

cce_gpu_memory_utilization_process

GPU memory usage of each process

cce_gpu_encoder_utilization_process

GPU encoding usage of each process

cce_gpu_decoder_utilization_process

GPU decoding usage of each process

cce_gpu_memory_used

Used GPU memory

cce_gpu_memory_total

Total GPU memory

cce_gpu_memory_free

Free GPU memory

cce_gpu_bar1_memory_used

Used GPU BAR1 memory

cce_gpu_bar1_memory_total

Total GPU BAR1 memory

cce_gpu_clock

GPU clock frequency

cce_gpu_memory_clock

GPU memory frequency

cce_gpu_graphics_clock

GPU frequency

cce_gpu_video_clock

GPU video processor frequency

cce_gpu_temperature

GPU temperature

cce_gpu_power_usage

GPU power

cce_gpu_total_energy_consumption

Total GPU energy consumption

cce_gpu_pcie_link_bandwidth

GPU PCIe bandwidth

cce_gpu_nvlink_bandwidth

GPU NVLink bandwidth

cce_gpu_pcie_throughput_rx

GPU PCIe RX bandwidth

cce_gpu_pcie_throughput_tx

GPU PCIe TX bandwidth

cce_gpu_nvlink_utilization_counter_rx

GPU NVLink RX bandwidth

cce_gpu_nvlink_utilization_counter_tx

GPU NVLink TX bandwidth

cce_gpu_retired_pages_sbe

Number of GPU single-bit error isolation pages

cce_gpu_retired_pages_dbe

Number of GPU dual-bit error isolation pages

xgpu_memory_total

Total xGPU memory

xgpu_memory_used

Used xGPU memory

xgpu_core_percentage_total

Total xGPU compute

xgpu_core_percentage_used

Used xGPU compute

gpu_schedule_policy

There are three GPU modes specified by three values. The value 0 indicates the GPU memory isolation, compute sharing mode. The value 1 indicates the GPU memory and compute isolation mode. The value 2 indicates the default mode, indicating that the GPU is not virtualized.

xgpu_device_health

Health status of xGPU. The value 0 indicates that the xGPU is healthy, and the value 1 indicates that the xGPU is unhealthy.

serviceMonitor/monitoring/prometheus-server/0

prometheus-server

prometheus_build_info

Information to build Prometheus

prometheus_engine_query_duration_seconds

Query time

prometheus_engine_query_duration_seconds_count

Number of queries

prometheus_sd_discovered_targets

Number of targets discovered by each job

prometheus_remote_storage_bytes_total

Number of bytes sent

prometheus_remote_storage_enqueue_retries_total

Number of retries for entering a queue

prometheus_remote_storage_highest_timestamp_in_seconds

Highest timestamp that has come into the remote storage via the Appender interface, in seconds since epoch

prometheus_remote_storage_queue_highest_sent_timestamp_seconds

Highest timestamp successfully sent by a remote write

prometheus_remote_storage_samples_dropped_total

Total number of samples read from the WAL but not sent to remote storage

prometheus_remote_storage_samples_failed_total

Number of samples that failed to be sent to remote storage

prometheus_remote_storage_samples_in_total

Number of samples read into remote storage

prometheus_remote_storage_samples_pending

Number of samples pending in shards to be sent to remote storage

prometheus_remote_storage_samples_retried_total

Number of samples which failed to be sent to remote storage but were retried

prometheus_remote_storage_samples_total

Total number of samples sent to remote storage

prometheus_remote_storage_shard_capacity

Capacity of each shard of the queue used for parallel sending to the remote storage

prometheus_remote_storage_shards

Number of shards used for parallel sending to the remote storage

prometheus_remote_storage_shards_desired

Number of shards that the queues shard calculation wants to run based on the rate of samples in vs. samples out

prometheus_remote_storage_shards_max

Maximum number of shards that the queue is allowed to run

prometheus_remote_storage_shards_min

Minimum number of shards that the queue is allowed to run

prometheus_tsdb_wal_segment_current

WAL segment index that TSDB is currently writing to

prometheus_tsdb_head_chunks

Number of chunks in the head block

prometheus_tsdb_head_series

Number of series in the head block

prometheus_tsdb_head_samples_appended_total

Number of appended samples

prometheus_wal_watcher_current_segment

Current segment the WAL watcher is reading records from

prometheus_target_interval_length_seconds

Actual intervals between scrapes

prometheus_target_interval_length_seconds_count

Actual intervals between scrapes (count)

prometheus_target_interval_length_seconds_sum

Actual intervals between scrapes (sum)

prometheus_target_scrapes_exceeded_body_size_limit_total

Number of scrapes that hit the body size limit

prometheus_target_scrapes_exceeded_sample_limit_total

Number of scrapes that hit the sample limit

prometheus_target_scrapes_sample_duplicate_timestamp_total

Number scraped samples with duplicate timestamps

prometheus_target_scrapes_sample_out_of_bounds_total

Number of samples rejected due to timestamp falling outside of the time bounds

prometheus_target_scrapes_sample_out_of_order_total

Number of out-of-order samples

prometheus_target_sync_length_seconds

Interval for synchronizing the scrape pool

prometheus_target_sync_length_seconds_count

Interval for synchronizing the scrape pool (count)

prometheus_target_sync_length_seconds_sum

Interval for synchronizing the scrape pool (sum)

promhttp_metric_handler_requests_in_flight

Number of metrics being processed

promhttp_metric_handler_requests_total

Number of metric processing times

go_goroutines

Number of goroutines

podMonitor/monitoring/virtual-kubelet-pods/0

monitoring/virtual-kubelet-pods

container_cpu_load_average_10s

Value of container CPU load average over the last 10 seconds

container_cpu_system_seconds_total

Cumulative container CPU system time

container_cpu_usage_seconds_total

Cumulative CPU time consumed by a container in core-seconds

container_cpu_user_seconds_total

Usage of user CPU time

container_cpu_cfs_periods_total

Number of elapsed enforcement period intervals

container_cpu_cfs_throttled_periods_total

Number of throttled period intervals

container_cpu_cfs_throttled_seconds_total

Total time duration the container has been throttled

container_fs_inodes_free

Number of available inodes in a file system

container_fs_usage_bytes

File system usage

container_fs_inodes_total

Number of inodes in a file system

container_fs_io_current

Number of I/Os currently in progress in a disk or file system

container_fs_io_time_seconds_total

Cumulative seconds spent on doing I/Os by the disk or file system

container_fs_io_time_weighted_seconds_total

Cumulative weighted I/O time of a disk or file system

container_fs_limit_bytes

Total disk or file system capacity that can be consumed by a container

container_fs_reads_bytes_total

Cumulative amount of disk or file system data read by a container

container_fs_read_seconds_total

Cumulative number of seconds the container spent on reading disk or file system data

container_fs_reads_merged_total

Cumulative number of merged disk or file system reads made by the container.

container_fs_reads_total

Cumulative number of disk or file system reads completed by a container

container_fs_sector_reads_total

Cumulative number of disk or file system sector reads completed by a container

container_fs_sector_writes_total

Cumulative number of disk or file system sector writes completed by a container

container_fs_writes_bytes_total

Total amount of data written by a container to a disk or file system

container_fs_write_seconds_total

Cumulative number of seconds the container spent on writing data to the disk or file system

container_fs_writes_merged_total

Cumulative number of merged container writes to the disk or file system

container_fs_writes_total

Cumulative number of disk or file system writes completed by a container

container_blkio_device_usage_total

Blkio device bytes usage

container_memory_failures_total

Cumulative number of container memory allocation failures

container_memory_failcnt

Number of memory usage hits limits

container_memory_cache

Memory used for the page cache of a container

container_memory_mapped_file

Size of the container memory mapped file.

container_memory_max_usage_bytes

Maximum memory usage recorded for a container

container_memory_rss

Size of the resident memory set for a container

container_memory_swap

Container swap usage

container_memory_usage_bytes

Current memory usage of a container

container_memory_working_set_bytes

Memory usage of the working set of a container

container_network_receive_bytes_total

Total volume of data received by the container network

container_network_receive_errors_total

Cumulative number of errors encountered during reception

container_network_receive_packets_dropped_total

Cumulative number of packets dropped during reception

container_network_receive_packets_total

Cumulative number of packets received

container_network_transmit_bytes_total

Total volume of data transmitted on the container network

container_network_transmit_errors_total

Cumulative number of errors encountered during transmission

container_network_transmit_packets_dropped_total

Cumulative number of packets dropped during transmission

container_network_transmit_packets_total

Cumulative number of packets transmitted

container_processes

Number of processes running inside the container

container_sockets

Number of open sockets for the container

container_file_descriptors

Number of open file descriptors for a container

container_threads

Number of threads running inside the container

container_threads_max

Maximum number of threads allowed inside the container

container_ulimits_soft

Soft ulimit value of process 1 in the container. Unlimited if the value is -1, except priority and nice.

container_tasks_state

Number of tasks in the specified state, such as sleeping, running, stopped, uninterruptible, or ioawaiting

container_spec_cpu_period

CPU period of the container

container_spec_cpu_shares

CPU share of the container

container_spec_cpu_quota

CPU quota of the container

container_spec_memory_limit_bytes

Memory limit for the container

container_spec_memory_reservation_limit_bytes

Memory reservation limit for the container

container_spec_memory_swap_limit_bytes

Memory swap limit for the container

container_start_time_seconds

Running time of the container.

container_last_seen

Last time a container was seen by the exporter

container_accelerator_memory_used_bytes

GPU accelerator memory that is being used by the container

container_accelerator_memory_total_bytes

Total available memory of a GPU accelerator

container_accelerator_duty_cycle

Percentage of time when a GPU accelerator is actually running

podMonitor/monitoring/everest-csi-controller/0

monitoring/everest-csi-controller

everest_action_result_total

Number of action results

everest_function_duration_seconds_bucket

Histogram of action duration (bucket)

everest_function_duration_seconds_count

Histogram of action duration (count)

everest_function_duration_seconds_sum

Histogram of action duration (sum)

everest_function_duration_quantile_seconds

Time quantile required by the action

node_volume_read_completed_total

Number of completed reads

node_volume_read_merged_total

Number of merged reads

node_volume_read_bytes_total

Total number of bytes read by a sector

node_volume_read_time_milliseconds_total

Total read duration

node_volume_write_completed_total

Number of completed writes

node_volume_write_merged_total

Number of merged writes

node_volume_write_bytes_total

Total number of bytes written into a sector

node_volume_write_time_milliseconds_total

Total write duration

node_volume_io_now

Number of ongoing I/Os

node_volume_io_time_seconds_total

Total I/O operation duration

node_volume_capacity_bytes_available

Available capacity

node_volume_capacity_bytes_total

Total capacity

node_volume_capacity_bytes_used

Used capacity

node_volume_inodes_available

Available inodes

node_volume_inodes_total

Total number of inodes

node_volume_inodes_used

Used inodes

node_volume_read_transmissions_total

Number of read transmission times

node_volume_read_timeouts_total

Number of read timeouts

node_volume_read_sent_bytes_total

Number of bytes read

node_volume_read_queue_time_milliseconds_total

Read queue waiting time

node_volume_read_rtt_time_milliseconds_total

Read RTT

node_volume_write_transmissions_total

Number of write transmissions

node_volume_write_timeouts_total

Number of write timeouts

node_volume_write_queue_time_milliseconds_total

Write queue waiting time

node_volume_write_rtt_time_milliseconds_total

Write RTT

node_volume_localvolume_stats_capacity_bytes

Local storage capacity

node_volume_localvolume_stats_available_bytes

Available local storage

node_volume_localvolume_stats_used_bytes

Used local storage

node_volume_localvolume_stats_inodes

Number of inodes for a local volume

node_volume_localvolume_stats_inodes_used

Used inodes for a local volume

podMonitor/monitoring/nginx-ingress-controller/0

monitoring/nginx-ingress-controller

nginx_ingress_controller_bytes_sent

Number of bytes sent to the client

nginx_ingress_controller_connect_duration_seconds

Duration for connecting to the upstream server

nginx_ingress_controller_header_duration_seconds

Time required for receiving the first header from the upstream server

nginx_ingress_controller_ingress_upstream_latency_seconds

Upstream service latency

nginx_ingress_controller_request_duration_seconds

Time required for processing a request, in milliseconds

nginx_ingress_controller_request_size

Length of a request, including the request line, header, and body

nginx_ingress_controller_requests

Total number of HTTP requests processed by Nginx Ingress Controller since it starts

nginx_ingress_controller_response_duration_seconds

Time required for receiving the response from the upstream server

nginx_ingress_controller_response_size

Length of a response, including the request line, header, and body

nginx_ingress_controller_nginx_process_connections

Number of client connections in the active, read, write, or wait state

nginx_ingress_controller_nginx_process_connections_total

Total number of client connections in the accepted or handled state

nginx_ingress_controller_nginx_process_cpu_seconds_total

Total CPU time consumed by the Nginx process (unit: second)

nginx_ingress_controller_nginx_process_num_procs

Number of processes

nginx_ingress_controller_nginx_process_oldest_start_time_seconds

Start time in seconds since January 1, 1970

nginx_ingress_controller_nginx_process_read_bytes_total

Number of bytes read

nginx_ingress_controller_nginx_process_requests_total

Total number of requests processed by Nginx since startup

nginx_ingress_controller_nginx_process_resident_memory_bytes

Resident memory usage of a process, that is, the actual physical memory usage

nginx_ingress_controller_nginx_process_virtual_memory_bytes

Virtual memory usage of a process, that is, the total memory allocated to the process, including the actual physical memory and virtual swap space

nginx_ingress_controller_nginx_process_write_bytes_total

Amount of data written by the Nginx process to disks or other devices for long-term storage

nginx_ingress_controller_build_info

Build information of Nginx Ingress Controller, including the version and compilation time

nginx_ingress_controller_check_success

Health check result of Nginx Ingress Controller. 1: Normal. 0: Abnormal

nginx_ingress_controller_config_hash

Configured hash value

nginx_ingress_controller_config_last_reload_successful

Whether the Nginx Ingress Controller configuration is successfully reloaded

nginx_ingress_controller_config_last_reload_successful_timestamp_seconds

Last timestamp when the Nginx Ingress Controller configuration was successfully reloaded

nginx_ingress_controller_ssl_certificate_info

Nginx Ingress Controller certificate information

nginx_ingress_controller_success

Cumulative number of reload operations of Nginx Ingress Controller

nginx_ingress_controller_orphan_ingress

Whether the ingress is isolated. 1: Isolated. 0: Not isolated. namespace indicates the namespace where the ingress is located, ingress indicates the ingress name. type indicates that the isolation type (options: no-service and no-endpoint).

nginx_ingress_controller_admission_config_size

Size of the admission controller configuration

nginx_ingress_controller_admission_render_duration

Rendering duration of the admission controller

nginx_ingress_controller_admission_render_ingresses

Length of ingresses rendered by the admission controller

nginx_ingress_controller_admission_roundtrip_duration

Time spent by the admission controller to process new events

nginx_ingress_controller_admission_tested_duration

Time spent on admission controller tests

nginx_ingress_controller_admission_tested_ingresses

Length of ingresses processed by the admission controller