Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page

Show all

Basic Metrics: Container Metrics

Updated on 2024-09-06 GMT+08:00

This section describes the categories, names, and meanings of metrics reported to AOM from CCE's kube-prometheus-stack add-on or on-premises Kubernetes clusters.

Table 1 Metrics of containers running in CCE or on-premises Kubernetes clusters

Target Name

Job Name

Metric

Description

  • serviceMonitor/monitoring/coredns/0
  • serviceMonitor/monitoring/node-local-dns/0

coredns and node-local-dns

coredns_build_info

Information to build CoreDNS

coredns_cache_entries

Number of entries in the cache

coredns_cache_size

Cache size

coredns_cache_hits_total

Number of cache hits

coredns_cache_misses_total

Number of cache misses

coredns_cache_requests_total

Total number of DNS resolution requests in different dimensions

coredns_dns_request_duration_seconds_bucket

Histogram of DNS request duration (bucket)

coredns_dns_request_duration_seconds_count

Histogram of DNS request duration (count)

coredns_dns_request_duration_seconds_sum

Histogram of DNS request duration (sum)

coredns_dns_request_size_bytes_bucket

Histogram of the size of DNS request (bucket)

coredns_dns_request_size_bytes_count

Histogram of the size of DNS request (count)

coredns_dns_request_size_bytes_sum

Histogram of the size of DNS request (sum)

coredns_dns_requests_total

Number of DNS requests

coredns_dns_response_size_bytes_bucket

Histogram of the size of DNS response (bucket)

coredns_dns_response_size_bytes_count

Histogram of the size of DNS response (count)

coredns_dns_response_size_bytes_sum

Histogram of the size of DNS response (sum)

coredns_dns_responses_total

DNS response codes and number of DNS response codes

coredns_forward_conn_cache_hits_total

Number of cache hits for each protocol and data flow

coredns_forward_conn_cache_misses_total

Number of cache misses for each protocol and data flow

coredns_forward_healthcheck_broken_total

Unhealthy upstream count

coredns_forward_healthcheck_failures_total

Count of failed health checks per upstream

coredns_forward_max_concurrent_rejects_total

Number of requests rejected due to excessive concurrent requests

coredns_forward_request_duration_seconds_bucket

Histogram of forward request duration (bucket)

coredns_forward_request_duration_seconds_count

Histogram of forward request duration (count)

coredns_forward_request_duration_seconds_sum

Histogram of forward request duration (sum)

coredns_forward_requests_total

Number of requests for each data flow

coredns_forward_responses_total

Number of responses to each data flow

coredns_health_request_duration_seconds_bucket

Histogram of health request duration (bucket)

coredns_health_request_duration_seconds_count

Histogram of health request duration (count)

coredns_health_request_duration_seconds_sum

Histogram of health request duration (sum)

coredns_health_request_failures_total

Number of health request failures

coredns_hosts_reload_timestamp_seconds

Timestamp of the last reload of the host file

coredns_kubernetes_dns_programming_duration_seconds_bucket

Histogram of DNS programming duration (bucket)

coredns_kubernetes_dns_programming_duration_seconds_count

Histogram of DNS programming duration (count)

coredns_kubernetes_dns_programming_duration_seconds_sum

Histogram of DNS programming duration (sum)

coredns_local_localhost_requests_total

Number of localhost requests

coredns_nodecache_setup_errors_total

Number of nodecache setup errors

coredns_dns_response_rcode_count_total

Number of responses for each Zone and Rcode

coredns_dns_request_count_total

Number of DNS requests

coredns_dns_request_do_count_total

Number of requests with the DNSSEC OK (DO) bit set

coredns_dns_do_requests_total

Number of requests with the DO bit set

coredns_dns_request_type_count_total

Number of requests for each Zone and Type

coredns_panics_total

Total number of panics

coredns_plugin_enabled

Whether a plugin is enabled

coredns_reload_failed_total

Number of last reload failures

serviceMonitor/monitoring/kube-apiserver/0

apiserver

aggregator_unavailable_apiservice

Number of unavailable APIServices

apiserver_admission_controller_admission_duration_seconds_bucket

Processing delay of an Admission Controller

apiserver_admission_webhook_admission_duration_seconds_bucket

Processing delay of an Admission Webhook

apiserver_admission_webhook_admission_duration_seconds_count

Number of Admission Webhook processing requests

apiserver_client_certificate_expiration_seconds_bucket

Remaining validity period of the client certificate

apiserver_client_certificate_expiration_seconds_count

Remaining validity period of the client certificate

apiserver_current_inflight_requests

Number of read requests in process

apiserver_request_duration_seconds_bucket

Delay of the client's access to the APIServer

apiserver_request_total

Number of different requests to the APIServer

go_goroutines

Number of goroutines

kubernetes_build_info

Information to build Kubernetes

process_cpu_seconds_total

Total process CPU time

process_resident_memory_bytes

Size of the resident memory set for a process

rest_client_requests_total

Number of REST requests

workqueue_adds_total

Number of adds handled by a work queue

workqueue_depth

Depth of a work queue

workqueue_queue_duration_seconds_bucket

Duration when a task exists in the work queue

aggregator_unavailable_apiservice_total

Number of unavailable APIServices

rest_client_request_duration_seconds_bucket

Histogram of REST request duration

serviceMonitor/monitoring/kubelet/0

kubelet

kubelet_certificate_manager_client_expiration_renew_errors

Number of certificate renewal errors

kubelet_certificate_manager_client_ttl_seconds

Time-to-live (TTL) of the Kubelet client certificate

kubelet_cgroup_manager_duration_seconds_bucket

Duration of the cgroup manager operations (bucket)

kubelet_cgroup_manager_duration_seconds_count

Duration of the cgroup manager operations (count)

kubelet_node_config_error

If a configuration-related error occurs on a node, the value of this metric is true (1). If there is no configuration-related error, the value is false (0).

kubelet_node_name

Node name. The value is always 1.

kubelet_pleg_relist_duration_seconds_bucket

Duration of relisting pods in PLEG (bucket)

kubelet_pleg_relist_duration_seconds_count

Duration of relisting pods in PLEG (count)

kubelet_pleg_relist_interval_seconds_bucket

Interval between relisting operations in PLEG (bucket)

kubelet_pod_start_duration_seconds_count

Time required for starting a single pod (count)

kubelet_pod_start_duration_seconds_bucket

Time required for starting a single pod (bucket)

kubelet_pod_worker_duration_seconds_bucket

Duration for synchronizing a single pod. Operation type: create, update, or sync

kubelet_running_containers

Number of running containers

kubelet_running_pods

Number of running pods

kubelet_runtime_operations_duration_seconds_bucket

Duration of the runtime operations (bucket)

kubelet_runtime_operations_errors_total

Number of runtime operation errors listed by operation type

kubelet_runtime_operations_total

Number of runtime operations listed by operation type

kubelet_volume_stats_available_bytes

Number of available bytes in a volume

kubelet_volume_stats_capacity_bytes

Capacity of the volume in bytes

kubelet_volume_stats_inodes

Total number of inodes in a volume

kubelet_volume_stats_inodes_used

Number of used inodes in a volume

kubelet_volume_stats_used_bytes

Number of used bytes in a volume

storage_operation_duration_seconds_bucket

Duration of each storage operation (bucket)

storage_operation_duration_seconds_count

Duration of each storage operation (count)

storage_operation_errors_total

Number of storage operation errors

volume_manager_total_volumes

Number of volumes in the Volume Manager

rest_client_requests_total

Number of HTTP client requests partitioned by status code, method, and host

rest_client_request_duration_seconds_bucket

Request delay (bucket)

process_resident_memory_bytes

Size of the resident memory set for a process

process_cpu_seconds_total

Total process CPU time

go_goroutines

Number of goroutines

serviceMonitor/monitoring/kubelet/1

kubelet

container_cpu_cfs_periods_total

Number of elapsed enforcement period intervals

container_cpu_cfs_throttled_periods_total

Number of throttled period intervals

container_cpu_cfs_throttled_seconds_total

Total time duration the container has been throttled

container_cpu_load_average_10s

Value of container CPU load average over the last 10 seconds

container_cpu_usage_seconds_total

Cumulative CPU time consumed by a container in core-seconds

container_file_descriptors

Number of open file descriptors for a container

container_fs_inodes_free

Number of available inodes in a file system

container_fs_inodes_total

Number of inodes in a file system

container_fs_io_time_seconds_total

Cumulative seconds spent on doing I/Os by the disk or file system

container_fs_limit_bytes

Total disk or file system capacity that can be consumed by a container

container_fs_read_seconds_total

Cumulative number of seconds the container spent on reading disk or file system data

container_fs_reads_bytes_total

Cumulative amount of disk or file system data read by a container

container_fs_reads_total

Cumulative number of disk or file system reads completed by a container

container_fs_usage_bytes

File system usage

container_fs_write_seconds_total

Cumulative number of seconds the container spent on writing data to the disk or file system

container_fs_writes_bytes_total

Total amount of data written by a container to a disk or file system

container_fs_writes_total

Cumulative number of disk or file system writes completed by a container

container_memory_cache

Memory used for the page cache of a container

container_memory_failcnt

Number of memory usage hits limits

container_memory_max_usage_bytes

Maximum memory usage recorded for a container

container_memory_rss

Size of the resident memory set for a container

container_memory_swap

Container swap usage

container_memory_usage_bytes

Current memory usage of a container

container_memory_working_set_bytes

Memory usage of the working set of a container

container_network_receive_bytes_total

Total volume of data received by the container network

container_network_receive_errors_total

Cumulative number of errors encountered during reception

container_network_receive_packets_dropped_total

Cumulative number of packets dropped during reception

container_network_receive_packets_total

Cumulative number of packets received

container_network_transmit_bytes_total

Total volume of data transmitted on the container network

container_network_transmit_errors_total

Cumulative number of errors encountered during transmission

container_network_transmit_packets_dropped_total

Cumulative number of packets dropped during transmission

container_network_transmit_packets_total

Cumulative number of packets transmitted

container_spec_cpu_quota

CPU quota of the container

container_spec_memory_limit_bytes

Memory limit for the container

machine_cpu_cores

Number of logical CPU cores

machine_memory_bytes

Amount of memory

serviceMonitor/monitoring/kube-state-metrics/0

kube-state-metrics-prom

kube_cronjob_status_active

Running cronjob

kube_cronjob_info

Cronjob information

kube_cronjob_labels

Label of a cronjob

kube_configmap_info

ConfigMap information

kube_daemonset_created

DaemonSet creation time

kube_daemonset_status_current_number_scheduled

Number of DaemonSets that are being scheduled

kube_daemonset_status_desired_number_scheduled

Number of DaemonSets expected to be scheduled

kube_daemonset_status_number_available

Number of nodes that should be running a DaemonSet pod and have at least one DaemonSet pod running and available

kube_daemonset_status_number_misscheduled

Number of nodes that are not expected to run a DaemonSet pod

kube_daemonset_status_number_ready

Number of nodes that should be running the DaemonSet pods and have one or more DaemonSet pods running and ready

kube_daemonset_status_number_unavailable

Number of nodes that should be running the DaemonSet pods but have none of the DaemonSet pods running and available

kube_daemonset_status_updated_number_scheduled

Number of nodes that are running an updated DaemonSet pod

kube_deployment_created

Deployment creation timestamp

kube_deployment_labels

Deployment labels

kube_deployment_metadata_generation

Sequence number representing a specific generation of the desired state

kube_deployment_spec_replicas

Number of desired replicas for a Deployment

kube_deployment_spec_strategy_rollingupdate_max_unavailable

Maximum number of unavailable replicas during a rolling update of a Deployment

kube_deployment_status_observed_generation

The generation observed by the Deployment controller

kube_deployment_status_replicas

Number of current replicas of a Deployment

kube_deployment_status_replicas_available

Number of available replicas per Deployment

kube_deployment_status_replicas_ready

Number of ready replicas per Deployment

kube_deployment_status_replicas_unavailable

Number of unavailable replicas per Deployment

kube_deployment_status_replicas_updated

Number of updated replicas per Deployment

kube_job_info

Information about the job

kube_namespace_labels

Namespace labels

kube_node_labels

Node labels

kube_node_info

Information about a node

kube_node_spec_taint

Taint of a node

kube_node_spec_unschedulable

Whether new pods can be scheduled to a node

kube_node_status_allocatable

Allocatable resources on a node

kube_node_status_capacity

Capacity for different resources on a node

kube_node_status_condition

Condition of a node

kube_node_volcano_oversubscription_status

Node oversubscription status

kube_persistentvolume_status_phase

Phase of a PV status

kube_persistentvolumeclaim_status_phase

Phase of a PVC status

kube_persistentvolume_info

Information about a PV

kube_persistentvolumeclaim_info

Information about a PVC

kube_pod_container_info

Information about a container running in the pod

kube_pod_container_resource_limits

Number of container resource limits

kube_pod_container_resource_requests

Number of container resource requests

kube_pod_container_status_last_terminated_reason

Last reason the container was in a terminated state

kube_pod_container_status_ready

Whether the container's readiness check succeeded

kube_pod_container_status_restarts_total

Number of container restarts

kube_pod_container_status_running

Whether the container is running.

kube_pod_container_status_terminated

Whether the container is terminated

kube_pod_container_status_terminated_reason

The reason why the container is in a terminated state

kube_pod_container_status_waiting

Whether the container is waiting

kube_pod_container_status_waiting_reason

The reason why the container is in the waiting state

kube_pod_info

Information about a pod

kube_pod_labels

Pod labels

kube_pod_owner

Information about the pod's owner

kube_pod_status_phase

Current phase of a pod

kube_pod_status_ready

Whether the pod is ready

kube_secret_info

Information about a secret

kube_statefulset_created

StatefulSet creation timestamp

kube_statefulset_labels

Information about StatefulSet labels

kube_statefulset_metadata_generation

Sequence number representing a specific generation of the desired state for a StatefulSet

kube_statefulset_replicas

Number of desired pods for a StatefulSet

kube_statefulset_status_observed_generation

The generation observed by the StatefulSet controller

kube_statefulset_status_replicas

Number of replicas per StatefulSet

kube_statefulset_status_replicas_ready

Number of ready replicas per StatefulSet

kube_statefulset_status_replicas_updated

Number of updated replicas per StatefulSet

kube_job_spec_completions

Desired number of successfully finished pods that should run with the job

kube_job_status_failed

Failed jobs

kube_job_status_succeeded

Successful jobs

kube_node_status_allocatable_cpu_cores

Number of allocatable CPU cores of a node

kube_node_status_allocatable_memory_bytes

Total allocatable memory of a node

kube_replicaset_owner

Information about the ReplicaSet's owner

kube_resourcequota

Information about resource quota

kube_pod_spec_volumes_persistentvolumeclaims_info

Information about the PVC associated with the pod

serviceMonitor/monitoring/prometheus-lightweight/0

prometheus-lightweight

vm_persistentqueue_blocks_dropped_total

Number of dropped blocks in a send queue

vm_persistentqueue_blocks_read_total

Number of blocks read by a send queue

vm_persistentqueue_blocks_written_total

Number of blocks written to a send queue

vm_persistentqueue_bytes_pending

Number of pending bytes in a send queue

vm_persistentqueue_bytes_read_total

Number of bytes read by a send queue

vm_persistentqueue_bytes_written_total

Number of bytes written to a send queue

vm_promscrape_active_scrapers

Number of active scrapes

vm_promscrape_conn_read_errors_total

Number of read errors during scrapes

vm_promscrape_conn_write_errors_total

Number of write errors during scrapes

vm_promscrape_max_scrape_size_exceeded_errors_total

Number of failed scrapes due to the exceeded response size

vm_promscrape_scrape_duration_seconds_sum

Duration of scrapes (sum)

vm_promscrape_scrape_duration_seconds_count

Duration of scrapes (count)

vm_promscrape_scrapes_total

Number of scrapes

vmagent_remotewrite_bytes_sent_total

Number of bytes sent via a remote write

vmagent_remotewrite_duration_seconds_sum

Time required for a remote write (sum)

vmagent_remotewrite_duration_seconds_count

Time required for a remote write (count)

vmagent_remotewrite_packets_dropped_total

Number of dropped packets during a remote write

vmagent_remotewrite_pending_data_bytes

Number of pending bytes during a remote write

vmagent_remotewrite_requests_total

Number of requests of the remote write

vmagent_remotewrite_retries_count_total

Number of retries of the remote write

go_goroutines

Number of goroutines

serviceMonitor/monitoring/node-exporter/0

node-exporter

node_boot_time_seconds

Node boot time

node_context_switches_total

Number of context switches

node_cpu_seconds_total

Seconds each CPU spent doing each type of work

node_disk_io_now

Number of I/Os in progress

node_disk_io_time_seconds_total

Total seconds spent doing I/Os

node_disk_io_time_weighted_seconds_total

The weighted number of seconds spent doing I/Os

node_disk_read_bytes_total

Number of bytes that are read

node_disk_read_time_seconds_total

Number of seconds spent by all reads

node_disk_reads_completed_total

Number of reads completed

node_disk_write_time_seconds_total

Number of seconds spent by all writes

node_disk_writes_completed_total

Number of writes completed

node_disk_written_bytes_total

Number of bytes that are written

node_docker_thinpool_data_space_available

Available data space of a docker thin pool

node_docker_thinpool_metadata_space_available

Available metadata space of a docker thin pool

node_exporter_build_info

Node exporter build information

node_filefd_allocated

Allocated file descriptors

node_filefd_maximum

Maximum number of file descriptors

node_filesystem_avail_bytes

File system space that is available for use

node_filesystem_device_error

Whether an error occurred while getting statistics for the given device

node_filesystem_free_bytes

Remaining space of a file system

node_filesystem_readonly

Read-only file system

node_filesystem_size_bytes

Consumed space of a file system

node_forks_total

Number of forks

node_intr_total

Number of interruptions that occurred

node_load1

1-minute average CPU load

node_load15

15-minute average CPU load

node_load5

5-minute average CPU load

node_memory_Buffers_bytes

Memory of the node buffer

node_memory_Cached_bytes

Memory for the node page cache

node_memory_MemAvailable_bytes

Available memory of a node

node_memory_MemFree_bytes

Free memory of a node

node_memory_MemTotal_bytes

Total memory of a node

node_network_receive_bytes_total

Total amount of received data

node_network_receive_drop_total

Cumulative number of packets dropped during reception

node_network_receive_errs_total

Cumulative number of errors encountered during reception

node_network_receive_packets_total

Cumulative number of packets received

node_network_transmit_bytes_total

Total amount of transmitted data

node_network_transmit_drop_total

Cumulative number of dropped packets during transmission

node_network_transmit_errs_total

Cumulative number of errors encountered during transmission

node_network_transmit_packets_total

Cumulative number of packets transmitted

node_procs_blocked

Blocked processes

node_procs_running

Running processes

node_sockstat_sockets_used

Number of sockets in use

node_sockstat_TCP_alloc

Number of allocated TCP sockets

node_sockstat_TCP_inuse

Number of TCP sockets in use

node_sockstat_TCP_orphan

Number of orphaned TCP sockets

node_sockstat_TCP_tw

Number of TCP sockets in the TIME_WAIT state

node_sockstat_UDPLITE_inuse

Number of UDP-Lite sockets in use

node_sockstat_UDP_inuse

Number of UDP sockets in use

node_sockstat_UDP_mem

UDP socket buffer usage

node_timex_offset_seconds

Time offset

node_timex_sync_status

Synchronization status of node clocks

node_uname_info

Labeled system information as provided by the uname system call

node_vmstat_oom_kill

OOM kill in /proc/vmstat

process_cpu_seconds_total

Total process CPU time

process_max_fds

Maximum number of file descriptors of a process

process_open_fds

Opened file descriptors by a process

process_resident_memory_bytes

Size of the resident memory set for a process

process_start_time_seconds

Process start time

process_virtual_memory_bytes

Virtual memory size for a process

process_virtual_memory_max_bytes

Maximum virtual memory size for a process

node_netstat_Tcp_ActiveOpens

Number of TCP connections that directly change from the CLOSED state to the SYN-SENT state

node_netstat_Tcp_PassiveOpens

Number of TCP connections that directly change from the LISTEN state to the SYN-RCVD state

node_netstat_Tcp_CurrEstab

Number of TCP connections in the ESTABLISHED or CLOSE-WAIT state

node_vmstat_pgmajfault

Number of major faults per second in /proc/vmstat

node_vmstat_pgpgout

Number of page out between main memory and block device in /proc/vmstat

node_vmstat_pgfault

Number of page faults the system has made per second in /proc/vmstat

node_vmstat_pgpgin

Number of page in between main memory and block device in /proc/vmstat

node_processes_max_processes

PID limit value

node_processes_pids

Number of PIDs

node_nf_conntrack_entries

Number of currently allocated flow entries for connection tracking

node_nf_conntrack_entries_limit

Maximum size of a connection tracking table

promhttp_metric_handler_requests_in_flight

Number of metrics being processed

go_goroutines

Number of node exporter goroutines

podMonitor/monitoring/nvidia-gpu-device-plugin/0

monitoring/nvidia-gpu-device-plugin

cce_gpu_utilization

GPU compute usage

cce_gpu_memory_utilization

GPU memory usage

cce_gpu_encoder_utilization

GPU encoding usage

cce_gpu_decoder_utilization

GPU decoding usage

cce_gpu_utilization_process

GPU compute usage of each process

cce_gpu_memory_utilization_process

GPU memory usage of each process

cce_gpu_encoder_utilization_process

GPU encoding usage of each process

cce_gpu_decoder_utilization_process

GPU decoding usage of each process

cce_gpu_memory_used

Used GPU memory

cce_gpu_memory_total

Total GPU memory

cce_gpu_memory_free

Free GPU memory

cce_gpu_bar1_memory_used

Used GPU BAR1 memory

cce_gpu_bar1_memory_total

Total GPU BAR1 memory

cce_gpu_clock

GPU clock frequency

cce_gpu_memory_clock

GPU memory frequency

cce_gpu_graphics_clock

GPU frequency

cce_gpu_video_clock

GPU video processor frequency

cce_gpu_temperature

GPU temperature

cce_gpu_power_usage

GPU power

cce_gpu_total_energy_consumption

Total GPU energy consumption

cce_gpu_pcie_link_bandwidth

GPU PCIe bandwidth

cce_gpu_nvlink_bandwidth

GPU NVLink bandwidth

cce_gpu_pcie_throughput_rx

GPU PCIe RX bandwidth

cce_gpu_pcie_throughput_tx

GPU PCIe TX bandwidth

cce_gpu_nvlink_utilization_counter_rx

GPU NVLink RX bandwidth

cce_gpu_nvlink_utilization_counter_tx

GPU NVLink TX bandwidth

cce_gpu_retired_pages_sbe

Number of GPU single-bit error isolation pages

cce_gpu_retired_pages_dbe

Number of GPU dual-bit error isolation pages

xgpu_memory_total

Total xGPU memory

xgpu_memory_used

Used xGPU memory

xgpu_core_percentage_total

Total xGPU compute

xgpu_core_percentage_used

Used xGPU compute

gpu_schedule_policy

There are three GPU modes specified by three values. The value 0 indicates the GPU memory isolation, compute sharing mode. The value 1 indicates the GPU memory and compute isolation mode. The value 2 indicates the default mode, indicating that the GPU is not virtualized.

xgpu_device_health

Health status of xGPU. The value 0 indicates that the xGPU is healthy, and the value 1 indicates that the xGPU is unhealthy.

serviceMonitor/monitoring/prometheus-server/0

prometheus-server

prometheus_build_info

Information to build Prometheus

prometheus_engine_query_duration_seconds

Query time

prometheus_engine_query_duration_seconds_count

Number of queries

prometheus_sd_discovered_targets

Number of targets discovered by each job

prometheus_remote_storage_bytes_total

Number of bytes sent

prometheus_remote_storage_enqueue_retries_total

Number of retries for entering a queue

prometheus_remote_storage_highest_timestamp_in_seconds

Highest timestamp that has come into the remote storage via the Appender interface, in seconds since epoch

prometheus_remote_storage_queue_highest_sent_timestamp_seconds

Highest timestamp successfully sent by a remote write

prometheus_remote_storage_samples_dropped_total

Total number of samples read from the WAL but not sent to remote storage

prometheus_remote_storage_samples_failed_total

Number of samples that failed to be sent to remote storage

prometheus_remote_storage_samples_in_total

Number of samples read into remote storage

prometheus_remote_storage_samples_pending

Number of samples pending in shards to be sent to remote storage

prometheus_remote_storage_samples_retried_total

Number of samples which failed to be sent to remote storage but were retried

prometheus_remote_storage_samples_total

Total number of samples sent to remote storage

prometheus_remote_storage_shard_capacity

Capacity of each shard of the queue used for parallel sending to the remote storage

prometheus_remote_storage_shards

Number of shards used for parallel sending to the remote storage

prometheus_remote_storage_shards_desired

Number of shards that the queues shard calculation wants to run based on the rate of samples in vs. samples out

prometheus_remote_storage_shards_max

Maximum number of shards that the queue is allowed to run

prometheus_remote_storage_shards_min

Minimum number of shards that the queue is allowed to run

prometheus_tsdb_wal_segment_current

WAL segment index that TSDB is currently writing to

prometheus_tsdb_head_chunks

Number of chunks in the head block

prometheus_tsdb_head_series

Number of series in the head block

prometheus_tsdb_head_samples_appended_total

Number of appended samples

prometheus_wal_watcher_current_segment

Current segment the WAL watcher is reading records from

prometheus_target_interval_length_seconds

Actual intervals between scrapes

prometheus_target_interval_length_seconds_count

Actual intervals between scrapes (count)

prometheus_target_interval_length_seconds_sum

Actual intervals between scrapes (sum)

prometheus_target_scrapes_exceeded_body_size_limit_total

Number of scrapes that hit the body size limit

prometheus_target_scrapes_exceeded_sample_limit_total

Number of scrapes that hit the sample limit

prometheus_target_scrapes_sample_duplicate_timestamp_total

Number scraped samples with duplicate timestamps

prometheus_target_scrapes_sample_out_of_bounds_total

Number of samples rejected due to timestamp falling outside of the time bounds

prometheus_target_scrapes_sample_out_of_order_total

Number of out-of-order samples

prometheus_target_sync_length_seconds

Interval for synchronizing the scrape pool

prometheus_target_sync_length_seconds_count

Interval for synchronizing the scrape pool (count)

prometheus_target_sync_length_seconds_sum

Interval for synchronizing the scrape pool (sum)

promhttp_metric_handler_requests_in_flight

Number of metrics being processed

promhttp_metric_handler_requests_total

Number of metric processing times

go_goroutines

Number of goroutines

podMonitor/monitoring/virtual-kubelet-pods/0

monitoring/virtual-kubelet-pods

container_cpu_load_average_10s

Value of container CPU load average over the last 10 seconds

container_cpu_system_seconds_total

Cumulative container CPU system time

container_cpu_usage_seconds_total

Cumulative CPU time consumed by a container in core-seconds

container_cpu_user_seconds_total

Usage of user CPU time

container_cpu_cfs_periods_total

Number of elapsed enforcement period intervals

container_cpu_cfs_throttled_periods_total

Number of throttled period intervals

container_cpu_cfs_throttled_seconds_total

Total time duration the container has been throttled

container_fs_inodes_free

Number of available inodes in a file system

container_fs_usage_bytes

File system usage

container_fs_inodes_total

Number of inodes in a file system

container_fs_io_current

Number of I/Os currently in progress in a disk or file system

container_fs_io_time_seconds_total

Cumulative seconds spent on doing I/Os by the disk or file system

container_fs_io_time_weighted_seconds_total

Cumulative weighted I/O time of a disk or file system

container_fs_limit_bytes

Total disk or file system capacity that can be consumed by a container

container_fs_reads_bytes_total

Cumulative amount of disk or file system data read by a container

container_fs_read_seconds_total

Cumulative number of seconds the container spent on reading disk or file system data

container_fs_reads_merged_total

Cumulative number of merged disk or file system reads made by the container.

container_fs_reads_total

Cumulative number of disk or file system reads completed by a container

container_fs_sector_reads_total

Cumulative number of disk or file system sector reads completed by a container

container_fs_sector_writes_total

Cumulative number of disk or file system sector writes completed by a container

container_fs_writes_bytes_total

Total amount of data written by a container to a disk or file system

container_fs_write_seconds_total

Cumulative number of seconds the container spent on writing data to the disk or file system

container_fs_writes_merged_total

Cumulative number of merged container writes to the disk or file system

container_fs_writes_total

Cumulative number of disk or file system writes completed by a container

container_blkio_device_usage_total

Blkio device bytes usage

container_memory_failures_total

Cumulative number of container memory allocation failures

container_memory_failcnt

Number of memory usage hits limits

container_memory_cache

Memory used for the page cache of a container

container_memory_mapped_file

Size of the container memory mapped file.

container_memory_max_usage_bytes

Maximum memory usage recorded for a container

container_memory_rss

Size of the resident memory set for a container

container_memory_swap

Container swap usage

container_memory_usage_bytes

Current memory usage of a container

container_memory_working_set_bytes

Memory usage of the working set of a container

container_network_receive_bytes_total

Total volume of data received by the container network

container_network_receive_errors_total

Cumulative number of errors encountered during reception

container_network_receive_packets_dropped_total

Cumulative number of packets dropped during reception

container_network_receive_packets_total

Cumulative number of packets received

container_network_transmit_bytes_total

Total volume of data transmitted on the container network

container_network_transmit_errors_total

Cumulative number of errors encountered during transmission

container_network_transmit_packets_dropped_total

Cumulative number of packets dropped during transmission

container_network_transmit_packets_total

Cumulative number of packets transmitted

container_processes

Number of processes running inside the container

container_sockets

Number of open sockets for the container

container_file_descriptors

Number of open file descriptors for a container

container_threads

Number of threads running inside the container

container_threads_max

Maximum number of threads allowed inside the container

container_ulimits_soft

Soft ulimit value of process 1 in the container. Unlimited if the value is -1, except priority and nice.

container_tasks_state

Number of tasks in the specified state, such as sleeping, running, stopped, uninterruptible, or ioawaiting

container_spec_cpu_period

CPU period of the container

container_spec_cpu_shares

CPU share of the container

container_spec_cpu_quota

CPU quota of the container

container_spec_memory_limit_bytes

Memory limit for the container

container_spec_memory_reservation_limit_bytes

Memory reservation limit for the container

container_spec_memory_swap_limit_bytes

Memory swap limit for the container

container_start_time_seconds

Running time of the container.

container_last_seen

Last time a container was seen by the exporter

container_accelerator_memory_used_bytes

GPU accelerator memory that is being used by the container

container_accelerator_memory_total_bytes

Total available memory of a GPU accelerator

container_accelerator_duty_cycle

Percentage of time when a GPU accelerator is actually running

podMonitor/monitoring/everest-csi-controller/0

monitoring/everest-csi-controller

everest_action_result_total

Number of action results

everest_function_duration_seconds_bucket

Histogram of action duration (bucket)

everest_function_duration_seconds_count

Histogram of action duration (count)

everest_function_duration_seconds_sum

Histogram of action duration (sum)

everest_function_duration_quantile_seconds

Time quantile required by the action

node_volume_read_completed_total

Number of completed reads

node_volume_read_merged_total

Number of merged reads

node_volume_read_bytes_total

Total number of bytes read by a sector

node_volume_read_time_milliseconds_total

Total read duration

node_volume_write_completed_total

Number of completed writes

node_volume_write_merged_total

Number of merged writes

node_volume_write_bytes_total

Total number of bytes written into a sector

node_volume_write_time_milliseconds_total

Total write duration

node_volume_io_now

Number of ongoing I/Os

node_volume_io_time_seconds_total

Total I/O operation duration

node_volume_capacity_bytes_available

Available capacity

node_volume_capacity_bytes_total

Total capacity

node_volume_capacity_bytes_used

Used capacity

node_volume_inodes_available

Available inodes

node_volume_inodes_total

Total number of inodes

node_volume_inodes_used

Used inodes

node_volume_read_transmissions_total

Number of read transmission times

node_volume_read_timeouts_total

Number of read timeouts

node_volume_read_sent_bytes_total

Number of bytes read

node_volume_read_queue_time_milliseconds_total

Read queue waiting time

node_volume_read_rtt_time_milliseconds_total

Read RTT

node_volume_write_transmissions_total

Number of write transmissions

node_volume_write_timeouts_total

Number of write timeouts

node_volume_write_queue_time_milliseconds_total

Write queue waiting time

node_volume_write_rtt_time_milliseconds_total

Write RTT

node_volume_localvolume_stats_capacity_bytes

Local storage capacity

node_volume_localvolume_stats_available_bytes

Available local storage

node_volume_localvolume_stats_used_bytes

Used local storage

node_volume_localvolume_stats_inodes

Number of inodes for a local volume

node_volume_localvolume_stats_inodes_used

Used inodes for a local volume

podMonitor/monitoring/nginx-ingress-controller/0

monitoring/nginx-ingress-controller

nginx_ingress_controller_bytes_sent

Number of bytes sent to the client

nginx_ingress_controller_connect_duration_seconds

Duration for connecting to the upstream server

nginx_ingress_controller_header_duration_seconds

Time required for receiving the first header from the upstream server

nginx_ingress_controller_ingress_upstream_latency_seconds

Upstream service latency

nginx_ingress_controller_request_duration_seconds

Time required for processing a request, in milliseconds

nginx_ingress_controller_request_size

Length of a request, including the request line, header, and body

nginx_ingress_controller_requests

Total number of HTTP requests processed by Nginx Ingress Controller since it starts

nginx_ingress_controller_response_duration_seconds

Time required for receiving the response from the upstream server

nginx_ingress_controller_response_size

Length of a response, including the request line, header, and body

nginx_ingress_controller_nginx_process_connections

Number of client connections in the active, read, write, or wait state

nginx_ingress_controller_nginx_process_connections_total

Total number of client connections in the accepted or handled state

nginx_ingress_controller_nginx_process_cpu_seconds_total

Total CPU time consumed by the Nginx process (unit: second)

nginx_ingress_controller_nginx_process_num_procs

Number of processes

nginx_ingress_controller_nginx_process_oldest_start_time_seconds

Start time in seconds since January 1, 1970

nginx_ingress_controller_nginx_process_read_bytes_total

Number of bytes read

nginx_ingress_controller_nginx_process_requests_total

Total number of requests processed by Nginx since startup

nginx_ingress_controller_nginx_process_resident_memory_bytes

Resident memory usage of a process, that is, the actual physical memory usage

nginx_ingress_controller_nginx_process_virtual_memory_bytes

Virtual memory usage of a process, that is, the total memory allocated to the process, including the actual physical memory and virtual swap space

nginx_ingress_controller_nginx_process_write_bytes_total

Amount of data written by the Nginx process to disks or other devices for long-term storage

nginx_ingress_controller_build_info

Build information of Nginx Ingress Controller, including the version and compilation time

nginx_ingress_controller_check_success

Health check result of Nginx Ingress Controller. 1: Normal. 0: Abnormal

nginx_ingress_controller_config_hash

Configured hash value

nginx_ingress_controller_config_last_reload_successful

Whether the Nginx Ingress Controller configuration is successfully reloaded

nginx_ingress_controller_config_last_reload_successful_timestamp_seconds

Last timestamp when the Nginx Ingress Controller configuration was successfully reloaded

nginx_ingress_controller_ssl_certificate_info

Nginx Ingress Controller certificate information

nginx_ingress_controller_success

Cumulative number of reload operations of Nginx Ingress Controller

nginx_ingress_controller_orphan_ingress

Whether the ingress is isolated. 1: Isolated. 0: Not isolated. namespace indicates the namespace where the ingress is located, ingress indicates the ingress name. type indicates that the isolation type (options: no-service and no-endpoint).

nginx_ingress_controller_admission_config_size

Size of the admission controller configuration

nginx_ingress_controller_admission_render_duration

Rendering duration of the admission controller

nginx_ingress_controller_admission_render_ingresses

Length of ingresses rendered by the admission controller

nginx_ingress_controller_admission_roundtrip_duration

Time spent by the admission controller to process new events

nginx_ingress_controller_admission_tested_duration

Time spent on admission controller tests

nginx_ingress_controller_admission_tested_ingresses

Length of ingresses processed by the admission controller

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback