Basic Metrics: Container Metrics
This section describes the categories, names, and meanings of metrics reported to AOM from CCE's kube-prometheus-stack add-on or on-premises Kubernetes clusters.
Target Name |
Job Name |
Metric |
Description |
---|---|---|---|
|
coredns and node-local-dns |
coredns_build_info |
Information to build CoreDNS |
coredns_cache_entries |
Number of entries in the CoreDNS cache |
||
coredns_cache_size |
CoreDNS cache size |
||
coredns_cache_hits_total |
Number of CoreDNS cache hits |
||
coredns_cache_misses_total |
Number of CoreDNS cache misses |
||
coredns_cache_requests_total |
Total number of CoreDNS resolution requests in different dimensions |
||
coredns_dns_request_duration_seconds_bucket |
CoreDNS request latency |
||
coredns_dns_request_duration_seconds_count |
CoreDNS request processing time (seconds) |
||
coredns_dns_request_duration_seconds_sum |
Total CoreDNS request processing time (seconds) |
||
coredns_dns_request_size_bytes_bucket |
Size of the CoreDNS request in bytes |
||
coredns_dns_request_size_bytes_count |
CoreDNS request byte count |
||
coredns_dns_request_size_bytes_sum |
Total CoreDNS request bytes |
||
coredns_dns_requests_total |
Total number of CoreDNS requests |
||
coredns_dns_response_size_bytes_bucket |
Size of the returned CoreDNS response in bytes |
||
coredns_dns_response_size_bytes_count |
CoreDNS response byte count |
||
coredns_dns_response_size_bytes_sum |
Total CoreDNS response bytes |
||
coredns_dns_responses_total |
Total number of CoreDNS response codes |
||
coredns_forward_conn_cache_hits_total |
Total number of cache hits for each protocol and data flow |
||
coredns_forward_conn_cache_misses_total |
Total number of cache misses for each protocol and data flow |
||
coredns_forward_healthcheck_broken_total |
Total forwarding health check failures |
||
coredns_forward_healthcheck_failures_total |
Total forwarding health check faults |
||
coredns_forward_max_concurrent_rejects_total |
Total number of requests rejected due to excessive concurrent requests |
||
coredns_forward_request_duration_seconds_bucket |
CoreDNS forwarding request latency |
||
coredns_forward_request_duration_seconds_count |
CoreDNS forwarding request duration in seconds |
||
coredns_forward_request_duration_seconds_sum |
Total CoreDNS forwarding request duration in seconds |
||
coredns_forward_requests_total |
Total number of requests for each data flow |
||
coredns_forward_responses_total |
Total number of responses to each data flow |
||
coredns_health_request_duration_seconds_bucket |
CoreDNS health check request latency |
||
coredns_health_request_duration_seconds_count |
CoreDNS health check request duration in seconds |
||
coredns_health_request_duration_seconds_sum |
Total CoreDNS health check request duration in seconds |
||
coredns_health_request_failures_total |
Total number of failed CoreDNS health check requests |
||
coredns_hosts_reload_timestamp_seconds |
Timestamp of CoreDNS's last reload of the host file |
||
coredns_kubernetes_dns_programming_duration_seconds_bucket |
DNS programming latency |
||
coredns_kubernetes_dns_programming_duration_seconds_count |
DNS programming duration in seconds |
||
coredns_kubernetes_dns_programming_duration_seconds_sum |
Total DNS programming duration in seconds |
||
coredns_local_localhost_requests_total |
Total number of localhost requests processed by CoreDNS |
||
coredns_nodecache_setup_errors_total |
Total number of node cache plug-in setting errors |
||
coredns_dns_response_rcode_count_total |
Cumulative count of response codes |
||
coredns_dns_request_count_total |
Cumulative count of DNS requests made per zone, protocol, and family |
||
coredns_dns_request_do_count_total |
Cumulative count of requests with the DO bit set |
||
coredns_dns_do_requests_total |
Number of requests with the DO bit set |
||
coredns_dns_request_type_count_total |
Cumulative count of DNS requests per type |
||
coredns_panics_total |
Total number of CoreDNS abnormal exits |
||
coredns_plugin_enabled |
Whether a plugin is enabled in CoreDNS |
||
coredns_reload_failed_total |
Total number of configuration files that fail to be reloaded |
||
serviceMonitor/monitoring/kube-apiserver/0 |
apiserver |
aggregator_unavailable_apiservice |
Number of unavailable APIServices |
apiserver_admission_controller_admission_duration_seconds_bucket |
Processing delay of an admission controller |
||
apiserver_admission_webhook_admission_duration_seconds_bucket |
Processing delay of an admission webhook |
||
apiserver_admission_webhook_admission_duration_seconds_count |
Number of admission webhook processing requests |
||
apiserver_client_certificate_expiration_seconds_bucket |
Remaining validity period of the client certificate |
||
apiserver_client_certificate_expiration_seconds_count |
Remaining validity period of the client certificate |
||
apiserver_current_inflight_requests |
Number of read requests in process |
||
apiserver_request_duration_seconds_bucket |
Delay of the client's access to the APIServer |
||
apiserver_request_total |
Counter of API server requests broken out for code and other items |
||
go_goroutines |
Number of goroutines that exist |
||
kubernetes_build_info |
Information to build Kubernetes |
||
process_cpu_seconds_total |
Total process CPU time |
||
process_resident_memory_bytes |
Size of the resident memory set |
||
rest_client_requests_total |
Total number of HTTP requests, partitioned by status code and method |
||
workqueue_adds_total |
Total number of additions handled by a work queue |
||
workqueue_depth |
Current depth of a work queue |
||
workqueue_queue_duration_seconds_bucket |
Duration that a task stays in the current queue |
||
aggregator_unavailable_apiservice_total |
Number of unavailable APIServices |
||
rest_client_request_duration_seconds_bucket |
Number of HTTP requests, partitioned by status code and method |
||
serviceMonitor/monitoring/kubelet/0 |
kubelet |
kubelet_certificate_manager_client_expiration_renew_errors |
Number of certificate renewal errors |
kubelet_certificate_manager_client_ttl_seconds |
Time-to-live (TTL) of the Kubelet client certificate |
||
kubelet_cgroup_manager_duration_seconds_bucket |
Duration for destruction and update operations |
||
kubelet_cgroup_manager_duration_seconds_count |
Number of destruction and update operations |
||
kubelet_node_config_error |
If a configuration-related error occurs on a node, the value of this metric is true (1). If there is no configuration-related error, the value is false (0). |
||
kubelet_node_name |
Node name. The value is always 1. |
||
kubelet_pleg_relist_duration_seconds_bucket |
Duration for relisting pods in PLEG |
||
kubelet_pleg_relist_duration_seconds_count |
Duration in seconds for relisting pods in PLEG |
||
kubelet_pleg_relist_interval_seconds_bucket |
Interval between relisting operations in PLEG |
||
kubelet_pod_start_duration_seconds_count |
Number of pods that have been started |
||
kubelet_pod_start_duration_seconds_bucket |
Duration from the kubelet seeing a pod for the first time to the pod starting to run |
||
kubelet_pod_worker_duration_seconds_bucket |
Duration for synchronizing a single pod. |
||
kubelet_running_containers |
Number of running containers |
||
kubelet_running_pods |
Number of running pods |
||
kubelet_runtime_operations_duration_seconds_bucket |
Time of every operation |
||
kubelet_runtime_operations_errors_total |
Number of errors in operations at runtime level |
||
kubelet_runtime_operations_total |
Number of runtime operations of each type |
||
kubelet_volume_stats_available_bytes |
Number of available bytes in a volume |
||
kubelet_volume_stats_capacity_bytes |
Capacity in bytes of a volume |
||
kubelet_volume_stats_inodes |
Maximum number of inodes in a volume |
||
kubelet_volume_stats_inodes_used |
Number of used inodes in a volume |
||
kubelet_volume_stats_used_bytes |
Number of used bytes in a volume |
||
storage_operation_duration_seconds_bucket |
Duration for each storage operation |
||
storage_operation_duration_seconds_count |
Number of storage operations |
||
storage_operation_errors_total |
Number of storage operation errors |
||
volume_manager_total_volumes |
Number of volumes in Volume Manager |
||
rest_client_requests_total |
Total number of HTTP requests, partitioned by status code and method |
||
rest_client_request_duration_seconds_bucket |
Number of HTTP requests, partitioned by status code and method |
||
process_resident_memory_bytes |
Size of the resident memory set |
||
process_cpu_seconds_total |
Total process CPU time |
||
go_goroutines |
Number of goroutines that exist |
||
serviceMonitor/monitoring/kubelet/1 |
kubelet |
container_cpu_cfs_periods_total |
Total number of elapsed enforcement periods |
container_cpu_cfs_throttled_periods_total |
Number of throttled periods |
||
container_cpu_cfs_throttled_seconds_total |
Total duration a container has been throttled |
||
container_cpu_load_average_10s |
Value of container CPU load average over the last 10 seconds |
||
container_cpu_usage_seconds_total |
Total CPU time consumed |
||
container_file_descriptors |
Number of open file descriptors for a container |
||
container_fs_inodes_free |
Number of available inodes in a file system |
||
container_fs_inodes_total |
Total number of inodes in a file system |
||
container_fs_io_time_seconds_total |
Cumulative time spent on doing I/Os by the disk or file system |
||
container_fs_limit_bytes |
Total disk or file system capacity that can be consumed by a container |
||
container_fs_read_seconds_total |
Total time a container spent on reading disk or file system data |
||
container_fs_reads_bytes_total |
Cumulative amount of disk or file system data read by a container |
||
container_fs_reads_total |
Cumulative number of disk or file system reads completed by a container |
||
container_fs_usage_bytes |
File system usage |
||
container_fs_write_seconds_total |
Total time a container spent on writing data to the disk or file system |
||
container_fs_writes_bytes_total |
Total amount of data written by a container to a disk or file system |
||
container_fs_writes_total |
Cumulative number of disk or file system writes completed by a container |
||
container_memory_cache |
Memory used for the page cache of a container |
||
container_memory_failcnt |
Number of memory usage hits limits |
||
container_memory_max_usage_bytes |
Maximum memory usage recorded for a container |
||
container_memory_rss |
Size of the resident memory set for a container |
||
container_memory_swap |
Container swap memory usage |
||
container_memory_usage_bytes |
Current memory usage of a container |
||
container_memory_working_set_bytes |
Memory usage of the working set of a container |
||
container_network_receive_bytes_total |
Total volume of data received by a container network |
||
container_network_receive_errors_total |
Cumulative number of errors encountered during reception |
||
container_network_receive_packets_dropped_total |
Cumulative number of packets dropped during reception |
||
container_network_receive_packets_total |
Cumulative number of packets received |
||
container_network_transmit_bytes_total |
Total volume of data transmitted on a container network |
||
container_network_transmit_errors_total |
Cumulative number of errors encountered during transmission |
||
container_network_transmit_packets_dropped_total |
Cumulative number of packets dropped during transmission |
||
container_network_transmit_packets_total |
Cumulative number of packets transmitted |
||
container_spec_cpu_quota |
CPU quota of a container |
||
container_spec_memory_limit_bytes |
Memory limit for a container |
||
machine_cpu_cores |
Number of CPU cores of the physical machine or VM |
||
machine_memory_bytes |
Total memory size of the physical machine or VM |
||
serviceMonitor/monitoring/kube-state-metrics/0 |
kube-state-metrics-prom |
kube_cronjob_status_active |
Whether the cronjob is actively running jobs |
kube_cronjob_info |
Cronjob information |
||
kube_cronjob_labels |
Label of a cronjob |
||
kube_configmap_info |
ConfigMap information |
||
kube_daemonset_created |
DaemonSet creation time |
||
kube_daemonset_status_current_number_scheduled |
Number of DaemonSets that are being scheduled |
||
kube_daemonset_status_desired_number_scheduled |
Number of DaemonSets expected to be scheduled |
||
kube_daemonset_status_number_available |
Number of nodes that should be running a DaemonSet pod and have at least one DaemonSet pod running and available |
||
kube_daemonset_status_number_misscheduled |
Number of nodes that are not expected to run a DaemonSet pod |
||
kube_daemonset_status_number_ready |
Number of nodes that should be running the DaemonSet pods and have one or more DaemonSet pods running and ready |
||
kube_daemonset_status_number_unavailable |
Number of nodes that should be running the DaemonSet pods but have none of the DaemonSet pods running and available |
||
kube_daemonset_status_updated_number_scheduled |
Number of nodes that are running an updated DaemonSet pod |
||
kube_deployment_created |
Deployment creation timestamp |
||
kube_deployment_labels |
Deployment labels |
||
kube_deployment_metadata_generation |
Sequence number representing a specific generation of the desired state for a Deployment |
||
kube_deployment_spec_replicas |
Number of desired replicas for a Deployment |
||
kube_deployment_spec_strategy_rollingupdate_max_unavailable |
Maximum number of unavailable replicas during a rolling update of a Deployment |
||
kube_deployment_status_observed_generation |
The generation observed by the Deployment controller |
||
kube_deployment_status_replicas |
Number of current replicas of a Deployment |
||
kube_deployment_status_replicas_available |
Number of available replicas per Deployment |
||
kube_deployment_status_replicas_ready |
Number of ready replicas per Deployment |
||
kube_deployment_status_replicas_unavailable |
Number of unavailable replicas per Deployment |
||
kube_deployment_status_replicas_updated |
Number of updated replicas per Deployment |
||
kube_job_info |
Job information |
||
kube_namespace_labels |
Namespace labels |
||
kube_node_labels |
Node labels |
||
kube_node_info |
Node information |
||
kube_node_spec_taint |
Taint of a node |
||
kube_node_spec_unschedulable |
Whether new pods can be scheduled to a node |
||
kube_node_status_allocatable |
Allocatable resources on a node |
||
kube_node_status_capacity |
Capacity for different resources on a node |
||
kube_node_status_condition |
Node status condition |
||
kube_node_volcano_oversubscription_status |
Node oversubscription status |
||
kube_persistentvolume_status_phase |
PV status |
||
kube_persistentvolumeclaim_status_phase |
PVC status |
||
kube_persistentvolume_info |
PV information |
||
kube_persistentvolumeclaim_info |
PVC information |
||
kube_pod_container_info |
Information about a container running in the pod |
||
kube_pod_container_resource_limits |
Container resource limits |
||
kube_pod_container_resource_requests |
Number of resources requested by a container |
||
kube_pod_container_status_last_terminated_reason |
The last reason a container was in terminated state |
||
kube_pod_container_status_ready |
Whether a container is in ready state |
||
kube_pod_container_status_restarts_total |
Number of container restarts |
||
kube_pod_container_status_running |
Whether a container is in running state |
||
kube_pod_container_status_terminated |
Whether a container is in terminated state |
||
kube_pod_container_status_terminated_reason |
The reason a container is in terminated state |
||
kube_pod_container_status_waiting |
Whether a container is in waiting state |
||
kube_pod_container_status_waiting_reason |
The reason a container is in waiting state |
||
kube_pod_info |
Pod information |
||
kube_pod_labels |
Pod labels |
||
kube_pod_owner |
Object to which the pod belongs |
||
kube_pod_status_phase |
Phase of the pod |
||
kube_pod_status_ready |
Whether the pod is in ready state |
||
kube_secret_info |
Secret information |
||
kube_statefulset_created |
StatefulSet creation timestamp |
||
kube_statefulset_labels |
Information about StatefulSet labels |
||
kube_statefulset_metadata_generation |
Sequence number representing a specific generation of the desired state for a StatefulSet |
||
kube_statefulset_replicas |
Number of desired pods for a StatefulSet |
||
kube_statefulset_status_observed_generation |
Generation observed by the StatefulSet controller |
||
kube_statefulset_status_replicas |
Number of stateful replicas in a StatefulSet |
||
kube_statefulset_status_replicas_ready |
Number of ready replicas in a StatefulSet |
||
kube_statefulset_status_replicas_updated |
Number of updated replicas in a StatefulSet |
||
kube_job_spec_completions |
Desired number of successfully finished pods that should run with the job |
||
kube_job_status_failed |
Failed jobs |
||
kube_job_status_succeeded |
Successful jobs |
||
kube_node_status_allocatable_cpu_cores |
Number of allocatable CPU cores of a node |
||
kube_node_status_allocatable_memory_bytes |
Total allocatable memory of a node |
||
kube_replicaset_owner |
ReplicaSet owner. |
||
kube_resourcequota |
Resource quota |
||
kube_pod_spec_volumes_persistentvolumeclaims_info |
Information about the PVC associated with the pod |
||
serviceMonitor/monitoring/prometheus-lightweight/0 |
prometheus-lightweight |
vm_persistentqueue_blocks_dropped_total |
Total number of dropped blocks in a send queue |
vm_persistentqueue_blocks_read_total |
Total number of blocks read by a send queue |
||
vm_persistentqueue_blocks_written_total |
Total number of blocks written to a send queue |
||
vm_persistentqueue_bytes_pending |
Number of pending bytes in a send queue |
||
vm_persistentqueue_bytes_read_total |
Total number of bytes read by a send queue |
||
vm_persistentqueue_bytes_written_total |
Total number of bytes written to a send queue |
||
vm_promscrape_active_scrapers |
Number of collected shards |
||
vm_promscrape_conn_read_errors_total |
Total number of read errors during scrapes |
||
vm_promscrape_conn_write_errors_total |
Total number of write errors during scrapes |
||
vm_promscrape_max_scrape_size_exceeded_errors_total |
Total number of scrapes failed because responses exceed the size limit |
||
vm_promscrape_scrape_duration_seconds_sum |
Time required for the scrape |
||
vm_promscrape_scrape_duration_seconds_count |
Total time required for the scrape |
||
vm_promscrape_scrapes_total |
Number of scrapes |
||
vmagent_remotewrite_bytes_sent_total |
Total number of bytes sent through remote write |
||
vmagent_remotewrite_duration_seconds_sum |
Time consumed by remote writes |
||
vmagent_remotewrite_duration_seconds_count |
Total time consumed by remote writes |
||
vmagent_remotewrite_packets_dropped_total |
Total number of dropped packets during remote write |
||
vmagent_remotewrite_pending_data_bytes |
Number of pending bytes during remote write |
||
vmagent_remotewrite_requests_total |
Total number of remote write requests |
||
vmagent_remotewrite_retries_count_total |
Total number of remote write retries |
||
go_goroutines |
Number of goroutines that exist |
||
serviceMonitor/monitoring/node-exporter/0 |
node-exporter |
node_boot_time_seconds |
Node boot time |
node_context_switches_total |
Number of context switches |
||
node_cpu_seconds_total |
Seconds the CPUs spent in each mode |
||
node_disk_io_now |
Number of I/Os in progress |
||
node_disk_io_time_seconds_total |
Total seconds spent doing I/Os |
||
node_disk_io_time_weighted_seconds_total |
The weighted time spent doing I/Os |
||
node_disk_read_bytes_total |
Number of bytes that are read |
||
node_disk_read_time_seconds_total |
Number of seconds spent by all reads |
||
node_disk_reads_completed_total |
Number of reads completed |
||
node_disk_write_time_seconds_total |
Number of seconds spent by all writes |
||
node_disk_writes_completed_total |
Number of writes completed |
||
node_disk_written_bytes_total |
Number of bytes that are written |
||
node_docker_thinpool_data_space_available |
Available data space of a Docker thin pool |
||
node_docker_thinpool_metadata_space_available |
Available metadata space of a Docker thin pool |
||
node_exporter_build_info |
Node Exporter build information |
||
node_filefd_allocated |
Allocated file descriptors |
||
node_filefd_maximum |
Maximum number of file descriptors |
||
node_filesystem_avail_bytes |
File system space that is available for use |
||
node_filesystem_device_error |
Error in the mounted file system device |
||
node_filesystem_free_bytes |
Remaining space of a file system |
||
node_filesystem_readonly |
Read-only file system |
||
node_filesystem_size_bytes |
Consumed space of a file system |
||
node_forks_total |
Number of forks |
||
node_intr_total |
Number of interruptions that occurred |
||
node_load1 |
1-minute average CPU load |
||
node_load15 |
15-minute average CPU load |
||
node_load5 |
5-minute average CPU load |
||
node_memory_Buffers_bytes |
Memory of the node buffer |
||
node_memory_Cached_bytes |
Memory for the node page cache |
||
node_memory_MemAvailable_bytes |
Available memory of a node |
||
node_memory_MemFree_bytes |
Free memory of a node |
||
node_memory_MemTotal_bytes |
Total memory of a node |
||
node_network_receive_bytes_total |
Total amount of received data |
||
node_network_receive_drop_total |
Total number of packets dropped during reception |
||
node_network_receive_errs_total |
Total number of errors encountered during reception |
||
node_network_receive_packets_total |
Total number of packets received |
||
node_network_transmit_bytes_total |
Total number of sent bytes |
||
node_network_transmit_drop_total |
Total number of dropped packets |
||
node_network_transmit_errs_total |
Total number of errors encountered during transmission |
||
node_network_transmit_packets_total |
Total number of packets sent |
||
node_procs_blocked |
Blocked processes |
||
node_procs_running |
Running processes |
||
node_sockstat_sockets_used |
Number of sockets in use |
||
node_sockstat_TCP_alloc |
Number of allocated TCP sockets |
||
node_sockstat_TCP_inuse |
Number of TCP sockets in use |
||
node_sockstat_TCP_orphan |
Number of orphaned TCP sockets |
||
node_sockstat_TCP_tw |
Number of TCP sockets in the TIME_WAIT state |
||
node_sockstat_UDPLITE_inuse |
Number of UDP-Lite sockets in use |
||
node_sockstat_UDP_inuse |
Number of UDP sockets in use |
||
node_sockstat_UDP_mem |
UDP socket buffer usage |
||
node_timex_offset_seconds |
Time offset |
||
node_timex_sync_status |
Synchronization status of node clocks |
||
node_uname_info |
System kernel information |
||
node_vmstat_oom_kill |
Number of processes terminated due to insufficient memory |
||
process_cpu_seconds_total |
Total process CPU time |
||
process_max_fds |
Maximum number of file descriptors of a process |
||
process_open_fds |
Opened file descriptors by a process |
||
process_resident_memory_bytes |
Size of the resident memory set |
||
process_start_time_seconds |
Process start time |
||
process_virtual_memory_bytes |
Virtual memory size |
||
process_virtual_memory_max_bytes |
Maximum available virtual memory capacity |
||
node_netstat_Tcp_ActiveOpens |
Number of TCP connections that directly change from the CLOSED state to the SYN-SENT state |
||
node_netstat_Tcp_PassiveOpens |
Number of TCP connections that directly change from the LISTEN state to the SYN-RCVD state |
||
node_netstat_Tcp_CurrEstab |
Number of TCP connections in the ESTABLISHED or CLOSE-WAIT state |
||
node_vmstat_pgmajfault |
Number of major page faults in vmstat |
||
node_vmstat_pgpgout |
Number of page out in vmstat |
||
node_vmstat_pgfault |
Number of page faults in vmstat |
||
node_vmstat_pgpgin |
Number of page in in vmstat |
||
node_processes_max_processes |
Maximum number of processes |
||
node_processes_pids |
Number of PIDs |
||
node_nf_conntrack_entries |
Number of currently allocated flow entries for connection tracking |
||
node_nf_conntrack_entries_limit |
Maximum size of a connection tracking table |
||
promhttp_metric_handler_requests_in_flight |
Number of metrics being processed |
||
go_goroutines |
Number of goroutines that exist |
||
node_filesystem_files |
Number of files in the file system on the node |
||
node_filesystem_files_free |
Number of available files in the file system on the node |
||
podMonitor/monitoring/nvidia-gpu-device-plugin/0 |
monitoring/nvidia-gpu-device-plugin |
cce_gpu_utilization |
GPU compute usage |
cce_gpu_memory_utilization |
GPU memory usage |
||
cce_gpu_encoder_utilization |
GPU encoding usage |
||
cce_gpu_decoder_utilization |
GPU decoding usage |
||
cce_gpu_utilization_process |
GPU compute usage of each process |
||
cce_gpu_memory_utilization_process |
GPU memory usage of each process |
||
cce_gpu_encoder_utilization_process |
GPU encoding usage of each process |
||
cce_gpu_decoder_utilization_process |
GPU decoding usage of each process |
||
cce_gpu_memory_used |
Used GPU memory |
||
cce_gpu_memory_total |
Total GPU memory |
||
cce_gpu_memory_free |
Free GPU memory |
||
cce_gpu_bar1_memory_used |
Used GPU BAR1 memory |
||
cce_gpu_bar1_memory_total |
Total GPU BAR1 memory |
||
cce_gpu_clock |
GPU clock frequency |
||
cce_gpu_memory_clock |
GPU memory frequency |
||
cce_gpu_graphics_clock |
GPU frequency |
||
cce_gpu_video_clock |
GPU video processor frequency |
||
cce_gpu_temperature |
GPU temperature |
||
cce_gpu_power_usage |
GPU power |
||
cce_gpu_total_energy_consumption |
Total GPU energy consumption |
||
cce_gpu_pcie_link_bandwidth |
GPU PCIe bandwidth |
||
cce_gpu_nvlink_bandwidth |
GPU NVLink bandwidth |
||
cce_gpu_pcie_throughput_rx |
GPU PCIe RX bandwidth |
||
cce_gpu_pcie_throughput_tx |
GPU PCIe TX bandwidth |
||
cce_gpu_nvlink_utilization_counter_rx |
GPU NVLink RX bandwidth |
||
cce_gpu_nvlink_utilization_counter_tx |
GPU NVLink TX bandwidth |
||
cce_gpu_retired_pages_sbe |
Number of isolated GPU memory pages with single-bit errors |
||
cce_gpu_retired_pages_dbe |
Number of isolated GPU memory pages with dual-bit errors |
||
xgpu_memory_total |
Total xGPU memory |
||
xgpu_memory_used |
Used xGPU memory |
||
xgpu_core_percentage_total |
Total xGPU compute |
||
xgpu_core_percentage_used |
Used xGPU compute |
||
gpu_schedule_policy |
There are three GPU modes. 0: GPU memory isolation, compute sharing mode. 1: GPU memory and compute isolation mode. 2: default mode, indicating that the GPU is not virtualized. |
||
xgpu_device_health |
Health status of xGPU. 0: xGPU is healthy. 1: xGPU is unhealthy. |
||
serviceMonitor/monitoring/prometheus-server/0 |
prometheus-server |
prometheus_build_info |
Prometheus build information |
prometheus_engine_query_duration_seconds |
Time for query, in seconds |
||
prometheus_engine_query_duration_seconds_count |
Number of queries |
||
prometheus_sd_discovered_targets |
Number of metrics collected by different targets |
||
prometheus_remote_storage_bytes_total |
Total number of bytes of data (non-metadata) sent by the queue after compression |
||
prometheus_remote_storage_enqueue_retries_total |
Number of retries upon enqueuing failed due to full shard queue |
||
prometheus_remote_storage_highest_timestamp_in_seconds |
Latest timestamp in the remote storage |
||
prometheus_remote_storage_queue_highest_sent_timestamp_seconds |
Highest timestamp successfully sent by remote storage |
||
prometheus_remote_storage_samples_dropped_total |
Number of samples dropped before being sent to remote storage |
||
prometheus_remote_storage_samples_failed_total |
Number of samples that failed to be sent to remote storage |
||
prometheus_remote_storage_samples_in_total |
Number of samples sent to remote storage |
||
prometheus_remote_storage_samples_pending |
Number of samples pending in shards to be sent to remote storage |
||
prometheus_remote_storage_samples_retried_total |
Number of samples which failed to be sent to remote storage but were retried |
||
prometheus_remote_storage_samples_total |
Total number of samples sent to remote storage |
||
prometheus_remote_storage_shard_capacity |
Capacity of each shard of the queue used for parallel sending to the remote storage |
||
prometheus_remote_storage_shards |
Number of shards used for parallel sending to the remote storage |
||
prometheus_remote_storage_shards_desired |
Number of shards that the queues shard calculation wants to run based on the rate of samples in vs. samples out |
||
prometheus_remote_storage_shards_max |
Maximum number of shards that the queue is allowed to run |
||
prometheus_remote_storage_shards_min |
Minimum number of shards that the queue is allowed to run |
||
prometheus_tsdb_wal_segment_current |
WAL segment index that TSDB is currently writing to |
||
prometheus_tsdb_head_chunks |
Number of chunks in the head block |
||
prometheus_tsdb_head_series |
Number of time series stored in the head |
||
prometheus_tsdb_head_samples_appended_total |
Number of appended samples |
||
prometheus_wal_watcher_current_segment |
Current segment the WAL watcher is reading records from |
||
prometheus_target_interval_length_seconds |
Metric collection interval |
||
prometheus_target_interval_length_seconds_count |
Number of metric collection intervals |
||
prometheus_target_interval_length_seconds_sum |
Sum of metric collection intervals |
||
prometheus_target_scrapes_exceeded_body_size_limit_total |
Number of scrapes that hit the body size limit |
||
prometheus_target_scrapes_exceeded_sample_limit_total |
Number of scrapes that hit the sample limit |
||
prometheus_target_scrapes_sample_duplicate_timestamp_total |
Number of scraped samples with duplicate timestamps |
||
prometheus_target_scrapes_sample_out_of_bounds_total |
Number of samples rejected due to timestamp falling outside of the time bounds |
||
prometheus_target_scrapes_sample_out_of_order_total |
Number of out-of-order samples |
||
prometheus_target_sync_length_seconds |
Target synchronization interval |
||
prometheus_target_sync_length_seconds_count |
Number of target synchronization intervals |
||
prometheus_target_sync_length_seconds_sum |
Sum of target synchronization intervals |
||
promhttp_metric_handler_requests_in_flight |
Current number of scrapes being served |
||
promhttp_metric_handler_requests_total |
Total scrapes |
||
go_goroutines |
Number of goroutines that exist |
||
podMonitor/monitoring/virtual-kubelet-pods/0 |
monitoring/virtual-kubelet-pods |
container_cpu_load_average_10s |
Value of container CPU load average over the last 10 seconds |
container_cpu_system_seconds_total |
Cumulative CPU time of a container system |
||
container_cpu_usage_seconds_total |
Cumulative CPU time consumed by a container in core-seconds |
||
container_cpu_user_seconds_total |
Cumulative CPU time of a user |
||
container_cpu_cfs_periods_total |
Number of elapsed enforcement period intervals |
||
container_cpu_cfs_throttled_periods_total |
Number of throttled period intervals |
||
container_cpu_cfs_throttled_seconds_total |
Total duration a container has been throttled |
||
container_fs_inodes_free |
Number of available inodes in a file system |
||
container_fs_usage_bytes |
File system usage |
||
container_fs_inodes_total |
Number of inodes in a file system |
||
container_fs_io_current |
Number of I/Os currently in progress in a disk or file system |
||
container_fs_io_time_seconds_total |
Cumulative time spent on doing I/Os by the disk or file system |
||
container_fs_io_time_weighted_seconds_total |
Cumulative weighted I/O time of a disk or file system |
||
container_fs_limit_bytes |
Total disk or file system capacity that can be consumed by a container |
||
container_fs_reads_bytes_total |
Cumulative amount of disk or file system data read by a container |
||
container_fs_read_seconds_total |
Time a container spent on reading disk or file system data |
||
container_fs_reads_merged_total |
Cumulative number of merged disk or file system reads made by a container |
||
container_fs_reads_total |
Cumulative number of disk or file system reads completed by a container |
||
container_fs_sector_reads_total |
Cumulative number of disk or file system sector reads completed by a container |
||
container_fs_sector_writes_total |
Cumulative number of disk or file system sector writes completed by a container |
||
container_fs_writes_bytes_total |
Total amount of data written by a container to a disk or file system |
||
container_fs_write_seconds_total |
Time a container spent on writing data to the disk or file system |
||
container_fs_writes_merged_total |
Cumulative number of merged container writes to the disk or file system |
||
container_fs_writes_total |
Cumulative number of disk or file system writes completed by a container |
||
container_blkio_device_usage_total |
Blkio device bytes usage |
||
container_memory_failures_total |
Cumulative number of container memory allocation failures |
||
container_memory_failcnt |
Number of memory usage hits limits |
||
container_memory_cache |
Memory used for the page cache of a container |
||
container_memory_mapped_file |
Size of a container memory mapped file |
||
container_memory_max_usage_bytes |
Maximum memory usage recorded for a container |
||
container_memory_rss |
Size of the resident memory set for a container |
||
container_memory_swap |
Container swap usage |
||
container_memory_usage_bytes |
Current memory usage of a container |
||
container_memory_working_set_bytes |
Memory usage of the working set of a container |
||
container_network_receive_bytes_total |
Total volume of data received by a container network |
||
container_network_receive_errors_total |
Cumulative number of errors encountered during reception |
||
container_network_receive_packets_dropped_total |
Cumulative number of packets dropped during reception |
||
container_network_receive_packets_total |
Cumulative number of packets received |
||
container_network_transmit_bytes_total |
Total volume of data transmitted on a container network |
||
container_network_transmit_errors_total |
Cumulative number of errors encountered during transmission |
||
container_network_transmit_packets_dropped_total |
Cumulative number of packets dropped during transmission |
||
container_network_transmit_packets_total |
Cumulative number of packets transmitted |
||
container_processes |
Number of processes running inside a container |
||
container_sockets |
Number of open sockets for a container |
||
container_file_descriptors |
Number of open file descriptors for a container |
||
container_threads |
Number of threads running inside a container |
||
container_threads_max |
Maximum number of threads allowed inside a container |
||
container_ulimits_soft |
Soft ulimit value of process 1 in a container Unlimited if the value is -1, except priority and nice. |
||
container_tasks_state |
Number of tasks in the specified state, such as sleeping, running, stopped, uninterruptible, or ioawaiting |
||
container_spec_cpu_period |
CPU period of a container |
||
container_spec_cpu_shares |
CPU share of a container |
||
container_spec_cpu_quota |
CPU quota of a container |
||
container_spec_memory_limit_bytes |
Memory limit for a container |
||
container_spec_memory_reservation_limit_bytes |
Memory reservation limit for a container |
||
container_spec_memory_swap_limit_bytes |
Memory swap limit for a container |
||
container_start_time_seconds |
Running time of a container |
||
container_last_seen |
Last time a container was seen by the exporter |
||
container_accelerator_memory_used_bytes |
GPU accelerator memory that is being used by a container |
||
container_accelerator_memory_total_bytes |
Total available memory of a GPU accelerator |
||
container_accelerator_duty_cycle |
Percentage of time when a GPU accelerator is actually running |
||
podMonitor/monitoring/everest-csi-controller/0 |
monitoring/everest-csi-controller |
everest_action_result_total |
Invoking of different functions |
everest_function_duration_seconds_bucket |
Number of times that different functions are executed at different time |
||
everest_function_duration_seconds_count |
Number of invoking times of different functions |
||
everest_function_duration_seconds_sum |
Total invoking time of different functions |
||
everest_function_duration_quantile_seconds |
Time quantile required for invoking different functions |
||
node_volume_read_completed_total |
Number of completed reads |
||
node_volume_read_merged_total |
Number of merged reads |
||
node_volume_read_bytes_total |
Total number of bytes read by a sector |
||
node_volume_read_time_milliseconds_total |
Total read duration |
||
node_volume_write_completed_total |
Number of completed writes |
||
node_volume_write_merged_total |
Number of merged writes |
||
node_volume_write_bytes_total |
Total number of bytes written into a sector |
||
node_volume_write_time_milliseconds_total |
Total write duration |
||
node_volume_io_now |
Number of ongoing I/Os |
||
node_volume_io_time_seconds_total |
Total duration of I/O operations |
||
node_volume_capacity_bytes_available |
Available capacity |
||
node_volume_capacity_bytes_total |
Total capacity |
||
node_volume_capacity_bytes_used |
Used capacity |
||
node_volume_inodes_available |
Available inodes |
||
node_volume_inodes_total |
Total number of inodes |
||
node_volume_inodes_used |
Used inodes |
||
node_volume_read_transmissions_total |
Number of read transmission times |
||
node_volume_read_timeouts_total |
Number of read timeouts |
||
node_volume_read_sent_bytes_total |
Number of bytes read |
||
node_volume_read_queue_time_milliseconds_total |
Total read queue waiting time |
||
node_volume_read_rtt_time_milliseconds_total |
Total read RTT |
||
node_volume_write_transmissions_total |
Total number of write transmissions |
||
node_volume_write_timeouts_total |
Total number of write timeouts |
||
node_volume_write_queue_time_milliseconds_total |
Total write queue waiting time |
||
node_volume_write_rtt_time_milliseconds_total |
Total write RTT |
||
node_volume_localvolume_stats_capacity_bytes |
Total local volume capacity |
||
node_volume_localvolume_stats_available_bytes |
Available local volume capacity |
||
node_volume_localvolume_stats_used_bytes |
Used local volume capacity |
||
node_volume_localvolume_stats_inodes |
Number of inodes for a local volume |
||
node_volume_localvolume_stats_inodes_used |
Used inodes for a local volume |
||
podMonitor/monitoring/nginx-ingress-controller/0 |
monitoring/nginx-ingress-controller |
nginx_ingress_controller_connect_duration_seconds_bucket |
Duration for connecting to the upstream server |
nginx_ingress_controller_connect_duration_seconds_sum |
Duration for connecting to the upstream server |
||
nginx_ingress_controller_connect_duration_seconds_count |
Duration for connecting to the upstream server |
||
nginx_ingress_controller_request_duration_seconds_bucket |
Time required for processing a request, in milliseconds |
||
nginx_ingress_controller_request_duration_seconds_sum |
Time required for processing a request, in milliseconds |
||
nginx_ingress_controller_request_duration_seconds_count |
Time required for processing a request, in milliseconds |
||
nginx_ingress_controller_request_size_bucket |
Length of a request (including the request line, header, and body) |
||
nginx_ingress_controller_request_size_sum |
Length of a request (including the request line, header, and body) |
||
nginx_ingress_controller_request_size_count |
Length of a request (including the request line, header, and body) |
||
nginx_ingress_controller_response_duration_seconds_bucket |
Time required for receiving the response from the upstream server |
||
nginx_ingress_controller_response_duration_seconds_sum |
Time required for receiving the response from the upstream server |
||
nginx_ingress_controller_response_duration_seconds_count |
Time required for receiving the response from the upstream server |
||
nginx_ingress_controller_response_size_bucket |
Length of a response (including the request line, header, and request body) |
||
nginx_ingress_controller_response_size_sum |
Length of a response (including the request line, header, and request body) |
||
nginx_ingress_controller_response_size_count |
Length of a response (including the request line, header, and request body) |
||
nginx_ingress_controller_header_duration_seconds_bucket |
Time required for receiving the first header from the upstream server |
||
nginx_ingress_controller_header_duration_seconds_sum |
Time required for receiving the first header from the upstream server |
||
nginx_ingress_controller_header_duration_seconds_count |
Time required for receiving the first header from the upstream server |
||
nginx_ingress_controller_bytes_sent |
Number of bytes sent to the client |
||
nginx_ingress_controller_ingress_upstream_latency_seconds |
Upstream service latency |
||
nginx_ingress_controller_requests |
Total number of client requests |
||
nginx_ingress_controller_nginx_process_connections |
Number of client connections in the active, read, write, or wait state |
||
nginx_ingress_controller_nginx_process_connections_total |
Total number of client connections in the accepted or handled state |
||
nginx_ingress_controller_nginx_process_cpu_seconds_total |
Total CPU time consumed by the Nginx process (unit: second) |
||
nginx_ingress_controller_nginx_process_num_procs |
Number of processes |
||
nginx_ingress_controller_nginx_process_oldest_start_time_seconds |
Start time in seconds since January 1, 1970 |
||
nginx_ingress_controller_nginx_process_read_bytes_total |
Total number of bytes read |
||
nginx_ingress_controller_nginx_process_requests_total |
Total number of requests processed by Nginx since startup |
||
nginx_ingress_controller_nginx_process_resident_memory_bytes |
Resident memory set usage of a process, that is, the actual physical memory usage |
||
nginx_ingress_controller_nginx_process_virtual_memory_bytes |
Virtual memory usage of a process, that is, the total memory allocated to the process, including the actual physical memory and virtual swap space |
||
nginx_ingress_controller_nginx_process_write_bytes_total |
Total amount of data written by the process to disks or other devices for long-term storage |
||
nginx_ingress_controller_build_info |
A metric with a constant '1' labeled with information about the build |
||
nginx_ingress_controller_check_success |
Cumulative count of syntax check operations of the Nginx ingress controller |
||
nginx_ingress_controller_config_hash |
Configured hash value |
||
nginx_ingress_controller_config_last_reload_successful |
Whether the last configuration reload attempt was successful |
||
nginx_ingress_controller_config_last_reload_successful_timestamp_seconds |
Timestamp of the last successful configuration reload |
||
nginx_ingress_controller_ssl_certificate_info |
All information associated with a certificate |
||
nginx_ingress_controller_success |
Cumulative number of reload operations of the Nginx ingress controller |
||
nginx_ingress_controller_orphan_ingress |
Status of an orphaned ingress (1 indicates an orphaned ingress). 0: Not isolated. namespace: namespace of the ingress ingress: name of the ingress type: status of the ingress. The value can be no-service or no-endpoint. |
||
nginx_ingress_controller_admission_config_size |
Size of the admission controller configuration |
||
nginx_ingress_controller_admission_render_duration |
Rendering duration of the admission controller |
||
nginx_ingress_controller_admission_render_ingresses |
Length of ingresses rendered by the admission controller |
||
nginx_ingress_controller_admission_roundtrip_duration |
Time spent by the admission controller to process new events |
||
nginx_ingress_controller_admission_tested_duration |
Time spent on admission controller tests |
||
nginx_ingress_controller_admission_tested_ingresses |
Length of ingresses processed by the admission controller |
||
podMonitor/monitoring/cceaddon-npd/0 |
monitoring/cceaddon-npd |
problem_counter |
Number of times that the check item is found abnormal |
problem_gauge |
Whether the check item has triggered an exception
|
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.