Updated on 2025-08-07 GMT+08:00

Basic Metrics: Container Metrics

This section describes the categories, names, and meanings of metrics reported to AOM from CCE's kube-prometheus-stack add-on or on-premises Kubernetes clusters.

Table 1 Metrics of containers running in CCE or on-premises Kubernetes clusters

Target Name

Job Name

Metric

Description

  • serviceMonitor/monitoring/coredns/0
  • serviceMonitor/monitoring/node-local-dns/0

coredns and node-local-dns

coredns_build_info

Information to build CoreDNS

coredns_cache_entries

Number of entries in the CoreDNS cache

coredns_cache_size

CoreDNS cache size

coredns_cache_hits_total

Number of CoreDNS cache hits

coredns_cache_misses_total

Number of CoreDNS cache misses

coredns_cache_requests_total

Total number of CoreDNS resolution requests in different dimensions

coredns_dns_request_duration_seconds_bucket

CoreDNS request latency

coredns_dns_request_duration_seconds_count

CoreDNS request processing time (seconds)

coredns_dns_request_duration_seconds_sum

Total CoreDNS request processing time (seconds)

coredns_dns_request_size_bytes_bucket

Size of the CoreDNS request in bytes

coredns_dns_request_size_bytes_count

CoreDNS request byte count

coredns_dns_request_size_bytes_sum

Total CoreDNS request bytes

coredns_dns_requests_total

Total number of CoreDNS requests

coredns_dns_response_size_bytes_bucket

Size of the returned CoreDNS response in bytes

coredns_dns_response_size_bytes_count

CoreDNS response byte count

coredns_dns_response_size_bytes_sum

Total CoreDNS response bytes

coredns_dns_responses_total

Total number of CoreDNS response codes

coredns_forward_conn_cache_hits_total

Total number of cache hits for each protocol and data flow

coredns_forward_conn_cache_misses_total

Total number of cache misses for each protocol and data flow

coredns_forward_healthcheck_broken_total

Total forwarding health check failures

coredns_forward_healthcheck_failures_total

Total forwarding health check faults

coredns_forward_max_concurrent_rejects_total

Total number of requests rejected due to excessive concurrent requests

coredns_forward_request_duration_seconds_bucket

CoreDNS forwarding request latency

coredns_forward_request_duration_seconds_count

CoreDNS forwarding request duration in seconds

coredns_forward_request_duration_seconds_sum

Total CoreDNS forwarding request duration in seconds

coredns_forward_requests_total

Total number of requests for each data flow

coredns_forward_responses_total

Total number of responses to each data flow

coredns_health_request_duration_seconds_bucket

CoreDNS health check request latency

coredns_health_request_duration_seconds_count

CoreDNS health check request duration in seconds

coredns_health_request_duration_seconds_sum

Total CoreDNS health check request duration in seconds

coredns_health_request_failures_total

Total number of failed CoreDNS health check requests

coredns_hosts_reload_timestamp_seconds

Timestamp of CoreDNS's last reload of the host file

coredns_kubernetes_dns_programming_duration_seconds_bucket

DNS programming latency

coredns_kubernetes_dns_programming_duration_seconds_count

DNS programming duration in seconds

coredns_kubernetes_dns_programming_duration_seconds_sum

Total DNS programming duration in seconds

coredns_local_localhost_requests_total

Total number of localhost requests processed by CoreDNS

coredns_nodecache_setup_errors_total

Total number of node cache plug-in setting errors

coredns_dns_response_rcode_count_total

Cumulative count of response codes

coredns_dns_request_count_total

Cumulative count of DNS requests made per zone, protocol, and family

coredns_dns_request_do_count_total

Cumulative count of requests with the DO bit set

coredns_dns_do_requests_total

Number of requests with the DO bit set

coredns_dns_request_type_count_total

Cumulative count of DNS requests per type

coredns_panics_total

Total number of CoreDNS abnormal exits

coredns_plugin_enabled

Whether a plugin is enabled in CoreDNS

coredns_reload_failed_total

Total number of configuration files that fail to be reloaded

serviceMonitor/monitoring/kube-apiserver/0

apiserver

aggregator_unavailable_apiservice

Number of unavailable APIServices

apiserver_admission_controller_admission_duration_seconds_bucket

Processing delay of an admission controller

apiserver_admission_webhook_admission_duration_seconds_bucket

Processing delay of an admission webhook

apiserver_admission_webhook_admission_duration_seconds_count

Number of admission webhook processing requests

apiserver_client_certificate_expiration_seconds_bucket

Remaining validity period of the client certificate

apiserver_client_certificate_expiration_seconds_count

Remaining validity period of the client certificate

apiserver_current_inflight_requests

Number of read requests in process

apiserver_request_duration_seconds_bucket

Delay of the client's access to the APIServer

apiserver_request_total

Counter of API server requests broken out for code and other items

go_goroutines

Number of goroutines that exist

kubernetes_build_info

Information to build Kubernetes

process_cpu_seconds_total

Total process CPU time

process_resident_memory_bytes

Size of the resident memory set

rest_client_requests_total

Total number of HTTP requests, partitioned by status code and method

workqueue_adds_total

Total number of additions handled by a work queue

workqueue_depth

Current depth of a work queue

workqueue_queue_duration_seconds_bucket

Duration that a task stays in the current queue

aggregator_unavailable_apiservice_total

Number of unavailable APIServices

rest_client_request_duration_seconds_bucket

Number of HTTP requests, partitioned by status code and method

serviceMonitor/monitoring/kubelet/0

kubelet

kubelet_certificate_manager_client_expiration_renew_errors

Number of certificate renewal errors

kubelet_certificate_manager_client_ttl_seconds

Time-to-live (TTL) of the Kubelet client certificate

kubelet_cgroup_manager_duration_seconds_bucket

Duration for destruction and update operations

kubelet_cgroup_manager_duration_seconds_count

Number of destruction and update operations

kubelet_node_config_error

If a configuration-related error occurs on a node, the value of this metric is true (1). If there is no configuration-related error, the value is false (0).

kubelet_node_name

Node name. The value is always 1.

kubelet_pleg_relist_duration_seconds_bucket

Duration for relisting pods in PLEG

kubelet_pleg_relist_duration_seconds_count

Duration in seconds for relisting pods in PLEG

kubelet_pleg_relist_interval_seconds_bucket

Interval between relisting operations in PLEG

kubelet_pod_start_duration_seconds_count

Number of pods that have been started

kubelet_pod_start_duration_seconds_bucket

Duration from the kubelet seeing a pod for the first time to the pod starting to run

kubelet_pod_worker_duration_seconds_bucket

Duration for synchronizing a single pod.

kubelet_running_containers

Number of running containers

kubelet_running_pods

Number of running pods

kubelet_runtime_operations_duration_seconds_bucket

Time of every operation

kubelet_runtime_operations_errors_total

Number of errors in operations at runtime level

kubelet_runtime_operations_total

Number of runtime operations of each type

kubelet_volume_stats_available_bytes

Number of available bytes in a volume

kubelet_volume_stats_capacity_bytes

Capacity in bytes of a volume

kubelet_volume_stats_inodes

Maximum number of inodes in a volume

kubelet_volume_stats_inodes_used

Number of used inodes in a volume

kubelet_volume_stats_used_bytes

Number of used bytes in a volume

storage_operation_duration_seconds_bucket

Duration for each storage operation

storage_operation_duration_seconds_count

Number of storage operations

storage_operation_errors_total

Number of storage operation errors

volume_manager_total_volumes

Number of volumes in Volume Manager

rest_client_requests_total

Total number of HTTP requests, partitioned by status code and method

rest_client_request_duration_seconds_bucket

Number of HTTP requests, partitioned by status code and method

process_resident_memory_bytes

Size of the resident memory set

process_cpu_seconds_total

Total process CPU time

go_goroutines

Number of goroutines that exist

serviceMonitor/monitoring/kubelet/1

kubelet

container_cpu_cfs_periods_total

Total number of elapsed enforcement periods

container_cpu_cfs_throttled_periods_total

Number of throttled periods

container_cpu_cfs_throttled_seconds_total

Total duration a container has been throttled

container_cpu_load_average_10s

Value of container CPU load average over the last 10 seconds

container_cpu_usage_seconds_total

Total CPU time consumed

container_file_descriptors

Number of open file descriptors for a container

container_fs_inodes_free

Number of available inodes in a file system

container_fs_inodes_total

Total number of inodes in a file system

container_fs_io_time_seconds_total

Cumulative time spent on doing I/Os by the disk or file system

container_fs_limit_bytes

Total disk or file system capacity that can be consumed by a container

container_fs_read_seconds_total

Total time a container spent on reading disk or file system data

container_fs_reads_bytes_total

Cumulative amount of disk or file system data read by a container

container_fs_reads_total

Cumulative number of disk or file system reads completed by a container

container_fs_usage_bytes

File system usage

container_fs_write_seconds_total

Total time a container spent on writing data to the disk or file system

container_fs_writes_bytes_total

Total amount of data written by a container to a disk or file system

container_fs_writes_total

Cumulative number of disk or file system writes completed by a container

container_memory_cache

Memory used for the page cache of a container

container_memory_failcnt

Number of memory usage hits limits

container_memory_max_usage_bytes

Maximum memory usage recorded for a container

container_memory_rss

Size of the resident memory set for a container

container_memory_swap

Container swap memory usage

container_memory_usage_bytes

Current memory usage of a container

container_memory_working_set_bytes

Memory usage of the working set of a container

container_network_receive_bytes_total

Total volume of data received by a container network

container_network_receive_errors_total

Cumulative number of errors encountered during reception

container_network_receive_packets_dropped_total

Cumulative number of packets dropped during reception

container_network_receive_packets_total

Cumulative number of packets received

container_network_transmit_bytes_total

Total volume of data transmitted on a container network

container_network_transmit_errors_total

Cumulative number of errors encountered during transmission

container_network_transmit_packets_dropped_total

Cumulative number of packets dropped during transmission

container_network_transmit_packets_total

Cumulative number of packets transmitted

container_spec_cpu_quota

CPU quota of a container

container_spec_memory_limit_bytes

Memory limit for a container

machine_cpu_cores

Number of CPU cores of the physical machine or VM

machine_memory_bytes

Total memory size of the physical machine or VM

serviceMonitor/monitoring/kube-state-metrics/0

kube-state-metrics-prom

kube_cronjob_status_active

Whether the cronjob is actively running jobs

kube_cronjob_info

Cronjob information

kube_cronjob_labels

Label of a cronjob

kube_configmap_info

ConfigMap information

kube_daemonset_created

DaemonSet creation time

kube_daemonset_status_current_number_scheduled

Number of DaemonSets that are being scheduled

kube_daemonset_status_desired_number_scheduled

Number of DaemonSets expected to be scheduled

kube_daemonset_status_number_available

Number of nodes that should be running a DaemonSet pod and have at least one DaemonSet pod running and available

kube_daemonset_status_number_misscheduled

Number of nodes that are not expected to run a DaemonSet pod

kube_daemonset_status_number_ready

Number of nodes that should be running the DaemonSet pods and have one or more DaemonSet pods running and ready

kube_daemonset_status_number_unavailable

Number of nodes that should be running the DaemonSet pods but have none of the DaemonSet pods running and available

kube_daemonset_status_updated_number_scheduled

Number of nodes that are running an updated DaemonSet pod

kube_deployment_created

Deployment creation timestamp

kube_deployment_labels

Deployment labels

kube_deployment_metadata_generation

Sequence number representing a specific generation of the desired state for a Deployment

kube_deployment_spec_replicas

Number of desired replicas for a Deployment

kube_deployment_spec_strategy_rollingupdate_max_unavailable

Maximum number of unavailable replicas during a rolling update of a Deployment

kube_deployment_status_observed_generation

The generation observed by the Deployment controller

kube_deployment_status_replicas

Number of current replicas of a Deployment

kube_deployment_status_replicas_available

Number of available replicas per Deployment

kube_deployment_status_replicas_ready

Number of ready replicas per Deployment

kube_deployment_status_replicas_unavailable

Number of unavailable replicas per Deployment

kube_deployment_status_replicas_updated

Number of updated replicas per Deployment

kube_job_info

Job information

kube_namespace_labels

Namespace labels

kube_node_labels

Node labels

kube_node_info

Node information

kube_node_spec_taint

Taint of a node

kube_node_spec_unschedulable

Whether new pods can be scheduled to a node

kube_node_status_allocatable

Allocatable resources on a node

kube_node_status_capacity

Capacity for different resources on a node

kube_node_status_condition

Node status condition

kube_node_volcano_oversubscription_status

Node oversubscription status

kube_persistentvolume_status_phase

PV status

kube_persistentvolumeclaim_status_phase

PVC status

kube_persistentvolume_info

PV information

kube_persistentvolumeclaim_info

PVC information

kube_pod_container_info

Information about a container running in the pod

kube_pod_container_resource_limits

Container resource limits

kube_pod_container_resource_requests

Number of resources requested by a container

kube_pod_container_status_last_terminated_reason

The last reason a container was in terminated state

kube_pod_container_status_ready

Whether a container is in ready state

kube_pod_container_status_restarts_total

Number of container restarts

kube_pod_container_status_running

Whether a container is in running state

kube_pod_container_status_terminated

Whether a container is in terminated state

kube_pod_container_status_terminated_reason

The reason a container is in terminated state

kube_pod_container_status_waiting

Whether a container is in waiting state

kube_pod_container_status_waiting_reason

The reason a container is in waiting state

kube_pod_info

Pod information

kube_pod_labels

Pod labels

kube_pod_owner

Object to which the pod belongs

kube_pod_status_phase

Phase of the pod

kube_pod_status_ready

Whether the pod is in ready state

kube_secret_info

Secret information

kube_statefulset_created

StatefulSet creation timestamp

kube_statefulset_labels

Information about StatefulSet labels

kube_statefulset_metadata_generation

Sequence number representing a specific generation of the desired state for a StatefulSet

kube_statefulset_replicas

Number of desired pods for a StatefulSet

kube_statefulset_status_observed_generation

Generation observed by the StatefulSet controller

kube_statefulset_status_replicas

Number of stateful replicas in a StatefulSet

kube_statefulset_status_replicas_ready

Number of ready replicas in a StatefulSet

kube_statefulset_status_replicas_updated

Number of updated replicas in a StatefulSet

kube_job_spec_completions

Desired number of successfully finished pods that should run with the job

kube_job_status_failed

Failed jobs

kube_job_status_succeeded

Successful jobs

kube_node_status_allocatable_cpu_cores

Number of allocatable CPU cores of a node

kube_node_status_allocatable_memory_bytes

Total allocatable memory of a node

kube_replicaset_owner

ReplicaSet owner.

kube_resourcequota

Resource quota

kube_pod_spec_volumes_persistentvolumeclaims_info

Information about the PVC associated with the pod

serviceMonitor/monitoring/prometheus-lightweight/0

prometheus-lightweight

vm_persistentqueue_blocks_dropped_total

Total number of dropped blocks in a send queue

vm_persistentqueue_blocks_read_total

Total number of blocks read by a send queue

vm_persistentqueue_blocks_written_total

Total number of blocks written to a send queue

vm_persistentqueue_bytes_pending

Number of pending bytes in a send queue

vm_persistentqueue_bytes_read_total

Total number of bytes read by a send queue

vm_persistentqueue_bytes_written_total

Total number of bytes written to a send queue

vm_promscrape_active_scrapers

Number of collected shards

vm_promscrape_conn_read_errors_total

Total number of read errors during scrapes

vm_promscrape_conn_write_errors_total

Total number of write errors during scrapes

vm_promscrape_max_scrape_size_exceeded_errors_total

Total number of scrapes failed because responses exceed the size limit

vm_promscrape_scrape_duration_seconds_sum

Time required for the scrape

vm_promscrape_scrape_duration_seconds_count

Total time required for the scrape

vm_promscrape_scrapes_total

Number of scrapes

vmagent_remotewrite_bytes_sent_total

Total number of bytes sent through remote write

vmagent_remotewrite_duration_seconds_sum

Time consumed by remote writes

vmagent_remotewrite_duration_seconds_count

Total time consumed by remote writes

vmagent_remotewrite_packets_dropped_total

Total number of dropped packets during remote write

vmagent_remotewrite_pending_data_bytes

Number of pending bytes during remote write

vmagent_remotewrite_requests_total

Total number of remote write requests

vmagent_remotewrite_retries_count_total

Total number of remote write retries

go_goroutines

Number of goroutines that exist

serviceMonitor/monitoring/node-exporter/0

node-exporter

node_boot_time_seconds

Node boot time

node_context_switches_total

Number of context switches

node_cpu_seconds_total

Seconds the CPUs spent in each mode

node_disk_io_now

Number of I/Os in progress

node_disk_io_time_seconds_total

Total seconds spent doing I/Os

node_disk_io_time_weighted_seconds_total

The weighted time spent doing I/Os

node_disk_read_bytes_total

Number of bytes that are read

node_disk_read_time_seconds_total

Number of seconds spent by all reads

node_disk_reads_completed_total

Number of reads completed

node_disk_write_time_seconds_total

Number of seconds spent by all writes

node_disk_writes_completed_total

Number of writes completed

node_disk_written_bytes_total

Number of bytes that are written

node_docker_thinpool_data_space_available

Available data space of a Docker thin pool

node_docker_thinpool_metadata_space_available

Available metadata space of a Docker thin pool

node_exporter_build_info

Node Exporter build information

node_filefd_allocated

Allocated file descriptors

node_filefd_maximum

Maximum number of file descriptors

node_filesystem_avail_bytes

File system space that is available for use

node_filesystem_device_error

Error in the mounted file system device

node_filesystem_free_bytes

Remaining space of a file system

node_filesystem_readonly

Read-only file system

node_filesystem_size_bytes

Consumed space of a file system

node_forks_total

Number of forks

node_intr_total

Number of interruptions that occurred

node_load1

1-minute average CPU load

node_load15

15-minute average CPU load

node_load5

5-minute average CPU load

node_memory_Buffers_bytes

Memory of the node buffer

node_memory_Cached_bytes

Memory for the node page cache

node_memory_MemAvailable_bytes

Available memory of a node

node_memory_MemFree_bytes

Free memory of a node

node_memory_MemTotal_bytes

Total memory of a node

node_network_receive_bytes_total

Total amount of received data

node_network_receive_drop_total

Total number of packets dropped during reception

node_network_receive_errs_total

Total number of errors encountered during reception

node_network_receive_packets_total

Total number of packets received

node_network_transmit_bytes_total

Total number of sent bytes

node_network_transmit_drop_total

Total number of dropped packets

node_network_transmit_errs_total

Total number of errors encountered during transmission

node_network_transmit_packets_total

Total number of packets sent

node_procs_blocked

Blocked processes

node_procs_running

Running processes

node_sockstat_sockets_used

Number of sockets in use

node_sockstat_TCP_alloc

Number of allocated TCP sockets

node_sockstat_TCP_inuse

Number of TCP sockets in use

node_sockstat_TCP_orphan

Number of orphaned TCP sockets

node_sockstat_TCP_tw

Number of TCP sockets in the TIME_WAIT state

node_sockstat_UDPLITE_inuse

Number of UDP-Lite sockets in use

node_sockstat_UDP_inuse

Number of UDP sockets in use

node_sockstat_UDP_mem

UDP socket buffer usage

node_timex_offset_seconds

Time offset

node_timex_sync_status

Synchronization status of node clocks

node_uname_info

System kernel information

node_vmstat_oom_kill

Number of processes terminated due to insufficient memory

process_cpu_seconds_total

Total process CPU time

process_max_fds

Maximum number of file descriptors of a process

process_open_fds

Opened file descriptors by a process

process_resident_memory_bytes

Size of the resident memory set

process_start_time_seconds

Process start time

process_virtual_memory_bytes

Virtual memory size

process_virtual_memory_max_bytes

Maximum available virtual memory capacity

node_netstat_Tcp_ActiveOpens

Number of TCP connections that directly change from the CLOSED state to the SYN-SENT state

node_netstat_Tcp_PassiveOpens

Number of TCP connections that directly change from the LISTEN state to the SYN-RCVD state

node_netstat_Tcp_CurrEstab

Number of TCP connections in the ESTABLISHED or CLOSE-WAIT state

node_vmstat_pgmajfault

Number of major page faults in vmstat

node_vmstat_pgpgout

Number of page out in vmstat

node_vmstat_pgfault

Number of page faults in vmstat

node_vmstat_pgpgin

Number of page in in vmstat

node_processes_max_processes

Maximum number of processes

node_processes_pids

Number of PIDs

node_nf_conntrack_entries

Number of currently allocated flow entries for connection tracking

node_nf_conntrack_entries_limit

Maximum size of a connection tracking table

promhttp_metric_handler_requests_in_flight

Number of metrics being processed

go_goroutines

Number of goroutines that exist

node_filesystem_files

Number of files in the file system on the node

node_filesystem_files_free

Number of available files in the file system on the node

podMonitor/monitoring/nvidia-gpu-device-plugin/0

monitoring/nvidia-gpu-device-plugin

cce_gpu_utilization

GPU compute usage

cce_gpu_memory_utilization

GPU memory usage

cce_gpu_encoder_utilization

GPU encoding usage

cce_gpu_decoder_utilization

GPU decoding usage

cce_gpu_utilization_process

GPU compute usage of each process

cce_gpu_memory_utilization_process

GPU memory usage of each process

cce_gpu_encoder_utilization_process

GPU encoding usage of each process

cce_gpu_decoder_utilization_process

GPU decoding usage of each process

cce_gpu_memory_used

Used GPU memory

cce_gpu_memory_total

Total GPU memory

cce_gpu_memory_free

Free GPU memory

cce_gpu_bar1_memory_used

Used GPU BAR1 memory

cce_gpu_bar1_memory_total

Total GPU BAR1 memory

cce_gpu_clock

GPU clock frequency

cce_gpu_memory_clock

GPU memory frequency

cce_gpu_graphics_clock

GPU frequency

cce_gpu_video_clock

GPU video processor frequency

cce_gpu_temperature

GPU temperature

cce_gpu_power_usage

GPU power

cce_gpu_total_energy_consumption

Total GPU energy consumption

cce_gpu_pcie_link_bandwidth

GPU PCIe bandwidth

cce_gpu_nvlink_bandwidth

GPU NVLink bandwidth

cce_gpu_pcie_throughput_rx

GPU PCIe RX bandwidth

cce_gpu_pcie_throughput_tx

GPU PCIe TX bandwidth

cce_gpu_nvlink_utilization_counter_rx

GPU NVLink RX bandwidth

cce_gpu_nvlink_utilization_counter_tx

GPU NVLink TX bandwidth

cce_gpu_retired_pages_sbe

Number of isolated GPU memory pages with single-bit errors

cce_gpu_retired_pages_dbe

Number of isolated GPU memory pages with dual-bit errors

xgpu_memory_total

Total xGPU memory

xgpu_memory_used

Used xGPU memory

xgpu_core_percentage_total

Total xGPU compute

xgpu_core_percentage_used

Used xGPU compute

gpu_schedule_policy

There are three GPU modes. 0: GPU memory isolation, compute sharing mode. 1: GPU memory and compute isolation mode. 2: default mode, indicating that the GPU is not virtualized.

xgpu_device_health

Health status of xGPU. 0: xGPU is healthy. 1: xGPU is unhealthy.

serviceMonitor/monitoring/prometheus-server/0

prometheus-server

prometheus_build_info

Prometheus build information

prometheus_engine_query_duration_seconds

Time for query, in seconds

prometheus_engine_query_duration_seconds_count

Number of queries

prometheus_sd_discovered_targets

Number of metrics collected by different targets

prometheus_remote_storage_bytes_total

Total number of bytes of data (non-metadata) sent by the queue after compression

prometheus_remote_storage_enqueue_retries_total

Number of retries upon enqueuing failed due to full shard queue

prometheus_remote_storage_highest_timestamp_in_seconds

Latest timestamp in the remote storage

prometheus_remote_storage_queue_highest_sent_timestamp_seconds

Highest timestamp successfully sent by remote storage

prometheus_remote_storage_samples_dropped_total

Number of samples dropped before being sent to remote storage

prometheus_remote_storage_samples_failed_total

Number of samples that failed to be sent to remote storage

prometheus_remote_storage_samples_in_total

Number of samples sent to remote storage

prometheus_remote_storage_samples_pending

Number of samples pending in shards to be sent to remote storage

prometheus_remote_storage_samples_retried_total

Number of samples which failed to be sent to remote storage but were retried

prometheus_remote_storage_samples_total

Total number of samples sent to remote storage

prometheus_remote_storage_shard_capacity

Capacity of each shard of the queue used for parallel sending to the remote storage

prometheus_remote_storage_shards

Number of shards used for parallel sending to the remote storage

prometheus_remote_storage_shards_desired

Number of shards that the queues shard calculation wants to run based on the rate of samples in vs. samples out

prometheus_remote_storage_shards_max

Maximum number of shards that the queue is allowed to run

prometheus_remote_storage_shards_min

Minimum number of shards that the queue is allowed to run

prometheus_tsdb_wal_segment_current

WAL segment index that TSDB is currently writing to

prometheus_tsdb_head_chunks

Number of chunks in the head block

prometheus_tsdb_head_series

Number of time series stored in the head

prometheus_tsdb_head_samples_appended_total

Number of appended samples

prometheus_wal_watcher_current_segment

Current segment the WAL watcher is reading records from

prometheus_target_interval_length_seconds

Metric collection interval

prometheus_target_interval_length_seconds_count

Number of metric collection intervals

prometheus_target_interval_length_seconds_sum

Sum of metric collection intervals

prometheus_target_scrapes_exceeded_body_size_limit_total

Number of scrapes that hit the body size limit

prometheus_target_scrapes_exceeded_sample_limit_total

Number of scrapes that hit the sample limit

prometheus_target_scrapes_sample_duplicate_timestamp_total

Number of scraped samples with duplicate timestamps

prometheus_target_scrapes_sample_out_of_bounds_total

Number of samples rejected due to timestamp falling outside of the time bounds

prometheus_target_scrapes_sample_out_of_order_total

Number of out-of-order samples

prometheus_target_sync_length_seconds

Target synchronization interval

prometheus_target_sync_length_seconds_count

Number of target synchronization intervals

prometheus_target_sync_length_seconds_sum

Sum of target synchronization intervals

promhttp_metric_handler_requests_in_flight

Current number of scrapes being served

promhttp_metric_handler_requests_total

Total scrapes

go_goroutines

Number of goroutines that exist

podMonitor/monitoring/virtual-kubelet-pods/0

monitoring/virtual-kubelet-pods

container_cpu_load_average_10s

Value of container CPU load average over the last 10 seconds

container_cpu_system_seconds_total

Cumulative CPU time of a container system

container_cpu_usage_seconds_total

Cumulative CPU time consumed by a container in core-seconds

container_cpu_user_seconds_total

Cumulative CPU time of a user

container_cpu_cfs_periods_total

Number of elapsed enforcement period intervals

container_cpu_cfs_throttled_periods_total

Number of throttled period intervals

container_cpu_cfs_throttled_seconds_total

Total duration a container has been throttled

container_fs_inodes_free

Number of available inodes in a file system

container_fs_usage_bytes

File system usage

container_fs_inodes_total

Number of inodes in a file system

container_fs_io_current

Number of I/Os currently in progress in a disk or file system

container_fs_io_time_seconds_total

Cumulative time spent on doing I/Os by the disk or file system

container_fs_io_time_weighted_seconds_total

Cumulative weighted I/O time of a disk or file system

container_fs_limit_bytes

Total disk or file system capacity that can be consumed by a container

container_fs_reads_bytes_total

Cumulative amount of disk or file system data read by a container

container_fs_read_seconds_total

Time a container spent on reading disk or file system data

container_fs_reads_merged_total

Cumulative number of merged disk or file system reads made by a container

container_fs_reads_total

Cumulative number of disk or file system reads completed by a container

container_fs_sector_reads_total

Cumulative number of disk or file system sector reads completed by a container

container_fs_sector_writes_total

Cumulative number of disk or file system sector writes completed by a container

container_fs_writes_bytes_total

Total amount of data written by a container to a disk or file system

container_fs_write_seconds_total

Time a container spent on writing data to the disk or file system

container_fs_writes_merged_total

Cumulative number of merged container writes to the disk or file system

container_fs_writes_total

Cumulative number of disk or file system writes completed by a container

container_blkio_device_usage_total

Blkio device bytes usage

container_memory_failures_total

Cumulative number of container memory allocation failures

container_memory_failcnt

Number of memory usage hits limits

container_memory_cache

Memory used for the page cache of a container

container_memory_mapped_file

Size of a container memory mapped file

container_memory_max_usage_bytes

Maximum memory usage recorded for a container

container_memory_rss

Size of the resident memory set for a container

container_memory_swap

Container swap usage

container_memory_usage_bytes

Current memory usage of a container

container_memory_working_set_bytes

Memory usage of the working set of a container

container_network_receive_bytes_total

Total volume of data received by a container network

container_network_receive_errors_total

Cumulative number of errors encountered during reception

container_network_receive_packets_dropped_total

Cumulative number of packets dropped during reception

container_network_receive_packets_total

Cumulative number of packets received

container_network_transmit_bytes_total

Total volume of data transmitted on a container network

container_network_transmit_errors_total

Cumulative number of errors encountered during transmission

container_network_transmit_packets_dropped_total

Cumulative number of packets dropped during transmission

container_network_transmit_packets_total

Cumulative number of packets transmitted

container_processes

Number of processes running inside a container

container_sockets

Number of open sockets for a container

container_file_descriptors

Number of open file descriptors for a container

container_threads

Number of threads running inside a container

container_threads_max

Maximum number of threads allowed inside a container

container_ulimits_soft

Soft ulimit value of process 1 in a container Unlimited if the value is -1, except priority and nice.

container_tasks_state

Number of tasks in the specified state, such as sleeping, running, stopped, uninterruptible, or ioawaiting

container_spec_cpu_period

CPU period of a container

container_spec_cpu_shares

CPU share of a container

container_spec_cpu_quota

CPU quota of a container

container_spec_memory_limit_bytes

Memory limit for a container

container_spec_memory_reservation_limit_bytes

Memory reservation limit for a container

container_spec_memory_swap_limit_bytes

Memory swap limit for a container

container_start_time_seconds

Running time of a container

container_last_seen

Last time a container was seen by the exporter

container_accelerator_memory_used_bytes

GPU accelerator memory that is being used by a container

container_accelerator_memory_total_bytes

Total available memory of a GPU accelerator

container_accelerator_duty_cycle

Percentage of time when a GPU accelerator is actually running

podMonitor/monitoring/everest-csi-controller/0

monitoring/everest-csi-controller

everest_action_result_total

Invoking of different functions

everest_function_duration_seconds_bucket

Number of times that different functions are executed at different time

everest_function_duration_seconds_count

Number of invoking times of different functions

everest_function_duration_seconds_sum

Total invoking time of different functions

everest_function_duration_quantile_seconds

Time quantile required for invoking different functions

node_volume_read_completed_total

Number of completed reads

node_volume_read_merged_total

Number of merged reads

node_volume_read_bytes_total

Total number of bytes read by a sector

node_volume_read_time_milliseconds_total

Total read duration

node_volume_write_completed_total

Number of completed writes

node_volume_write_merged_total

Number of merged writes

node_volume_write_bytes_total

Total number of bytes written into a sector

node_volume_write_time_milliseconds_total

Total write duration

node_volume_io_now

Number of ongoing I/Os

node_volume_io_time_seconds_total

Total duration of I/O operations

node_volume_capacity_bytes_available

Available capacity

node_volume_capacity_bytes_total

Total capacity

node_volume_capacity_bytes_used

Used capacity

node_volume_inodes_available

Available inodes

node_volume_inodes_total

Total number of inodes

node_volume_inodes_used

Used inodes

node_volume_read_transmissions_total

Number of read transmission times

node_volume_read_timeouts_total

Number of read timeouts

node_volume_read_sent_bytes_total

Number of bytes read

node_volume_read_queue_time_milliseconds_total

Total read queue waiting time

node_volume_read_rtt_time_milliseconds_total

Total read RTT

node_volume_write_transmissions_total

Total number of write transmissions

node_volume_write_timeouts_total

Total number of write timeouts

node_volume_write_queue_time_milliseconds_total

Total write queue waiting time

node_volume_write_rtt_time_milliseconds_total

Total write RTT

node_volume_localvolume_stats_capacity_bytes

Total local volume capacity

node_volume_localvolume_stats_available_bytes

Available local volume capacity

node_volume_localvolume_stats_used_bytes

Used local volume capacity

node_volume_localvolume_stats_inodes

Number of inodes for a local volume

node_volume_localvolume_stats_inodes_used

Used inodes for a local volume

podMonitor/monitoring/nginx-ingress-controller/0

monitoring/nginx-ingress-controller

nginx_ingress_controller_connect_duration_seconds_bucket

Duration for connecting to the upstream server

nginx_ingress_controller_connect_duration_seconds_sum

Duration for connecting to the upstream server

nginx_ingress_controller_connect_duration_seconds_count

Duration for connecting to the upstream server

nginx_ingress_controller_request_duration_seconds_bucket

Time required for processing a request, in milliseconds

nginx_ingress_controller_request_duration_seconds_sum

Time required for processing a request, in milliseconds

nginx_ingress_controller_request_duration_seconds_count

Time required for processing a request, in milliseconds

nginx_ingress_controller_request_size_bucket

Length of a request (including the request line, header, and body)

nginx_ingress_controller_request_size_sum

Length of a request (including the request line, header, and body)

nginx_ingress_controller_request_size_count

Length of a request (including the request line, header, and body)

nginx_ingress_controller_response_duration_seconds_bucket

Time required for receiving the response from the upstream server

nginx_ingress_controller_response_duration_seconds_sum

Time required for receiving the response from the upstream server

nginx_ingress_controller_response_duration_seconds_count

Time required for receiving the response from the upstream server

nginx_ingress_controller_response_size_bucket

Length of a response (including the request line, header, and request body)

nginx_ingress_controller_response_size_sum

Length of a response (including the request line, header, and request body)

nginx_ingress_controller_response_size_count

Length of a response (including the request line, header, and request body)

nginx_ingress_controller_header_duration_seconds_bucket

Time required for receiving the first header from the upstream server

nginx_ingress_controller_header_duration_seconds_sum

Time required for receiving the first header from the upstream server

nginx_ingress_controller_header_duration_seconds_count

Time required for receiving the first header from the upstream server

nginx_ingress_controller_bytes_sent

Number of bytes sent to the client

nginx_ingress_controller_ingress_upstream_latency_seconds

Upstream service latency

nginx_ingress_controller_requests

Total number of client requests

nginx_ingress_controller_nginx_process_connections

Number of client connections in the active, read, write, or wait state

nginx_ingress_controller_nginx_process_connections_total

Total number of client connections in the accepted or handled state

nginx_ingress_controller_nginx_process_cpu_seconds_total

Total CPU time consumed by the Nginx process (unit: second)

nginx_ingress_controller_nginx_process_num_procs

Number of processes

nginx_ingress_controller_nginx_process_oldest_start_time_seconds

Start time in seconds since January 1, 1970

nginx_ingress_controller_nginx_process_read_bytes_total

Total number of bytes read

nginx_ingress_controller_nginx_process_requests_total

Total number of requests processed by Nginx since startup

nginx_ingress_controller_nginx_process_resident_memory_bytes

Resident memory set usage of a process, that is, the actual physical memory usage

nginx_ingress_controller_nginx_process_virtual_memory_bytes

Virtual memory usage of a process, that is, the total memory allocated to the process, including the actual physical memory and virtual swap space

nginx_ingress_controller_nginx_process_write_bytes_total

Total amount of data written by the process to disks or other devices for long-term storage

nginx_ingress_controller_build_info

A metric with a constant '1' labeled with information about the build

nginx_ingress_controller_check_success

Cumulative count of syntax check operations of the Nginx ingress controller

nginx_ingress_controller_config_hash

Configured hash value

nginx_ingress_controller_config_last_reload_successful

Whether the last configuration reload attempt was successful

nginx_ingress_controller_config_last_reload_successful_timestamp_seconds

Timestamp of the last successful configuration reload

nginx_ingress_controller_ssl_certificate_info

All information associated with a certificate

nginx_ingress_controller_success

Cumulative number of reload operations of the Nginx ingress controller

nginx_ingress_controller_orphan_ingress

Status of an orphaned ingress (1 indicates an orphaned ingress). 0: Not isolated.

namespace: namespace of the ingress

ingress: name of the ingress

type: status of the ingress. The value can be no-service or no-endpoint.

nginx_ingress_controller_admission_config_size

Size of the admission controller configuration

nginx_ingress_controller_admission_render_duration

Rendering duration of the admission controller

nginx_ingress_controller_admission_render_ingresses

Length of ingresses rendered by the admission controller

nginx_ingress_controller_admission_roundtrip_duration

Time spent by the admission controller to process new events

nginx_ingress_controller_admission_tested_duration

Time spent on admission controller tests

nginx_ingress_controller_admission_tested_ingresses

Length of ingresses processed by the admission controller

podMonitor/monitoring/cceaddon-npd/0

monitoring/cceaddon-npd

problem_counter

Number of times that the check item is found abnormal

problem_gauge

Whether the check item has triggered an exception

  • 0: not triggered
  • 1: triggered