Basic Metrics: Container Metrics
This section describes the types, names, and meanings of metrics reported to AOM from CCE's kube-prometheus-stack add-on or on-premises Kubernetes clusters.
Target Name |
Job Name |
Metric |
Description |
---|---|---|---|
|
coredns and node-local-dns |
coredns_build_info |
Information to build CoreDNS |
coredns_cache_entries |
Number of entries in the cache |
||
coredns_cache_size |
Cache size |
||
coredns_cache_hits_total |
Number of cache hits |
||
coredns_cache_misses_total |
Number of cache misses |
||
coredns_cache_requests_total |
Total number of DNS resolution requests in different dimensions |
||
coredns_dns_request_duration_seconds_bucket |
Histogram of DNS request duration (bucket) |
||
coredns_dns_request_duration_seconds_count |
Histogram of DNS request duration (count) |
||
coredns_dns_request_duration_seconds_sum |
Histogram of DNS request duration (sum) |
||
coredns_dns_request_size_bytes_bucket |
Histogram of the size of DNS request (bucket) |
||
coredns_dns_request_size_bytes_count |
Histogram of the size of DNS request (count) |
||
coredns_dns_request_size_bytes_sum |
Histogram of the size of DNS request (sum) |
||
coredns_dns_requests_total |
Number of DNS requests |
||
coredns_dns_response_size_bytes_bucket |
Histogram of the size of DNS response (bucket) |
||
coredns_dns_response_size_bytes_count |
Histogram of the size of DNS response (count) |
||
coredns_dns_response_size_bytes_sum |
Histogram of the size of DNS response (sum) |
||
coredns_dns_responses_total |
DNS response codes and number of DNS response codes |
||
coredns_forward_conn_cache_hits_total |
Number of cache hits for each protocol and data flow |
||
coredns_forward_conn_cache_misses_total |
Number of cache misses for each protocol and data flow |
||
coredns_forward_healthcheck_broken_total |
Unhealthy upstream count |
||
coredns_forward_healthcheck_failures_total |
Count of failed health checks per upstream |
||
coredns_forward_max_concurrent_rejects_total |
Number of requests rejected due to excessive concurrent requests |
||
coredns_forward_request_duration_seconds_bucket |
Histogram of forward request duration (bucket) |
||
coredns_forward_request_duration_seconds_count |
Histogram of forward request duration (count) |
||
coredns_forward_request_duration_seconds_sum |
Histogram of forward request duration (sum) |
||
coredns_forward_requests_total |
Number of requests for each data flow |
||
coredns_forward_responses_total |
Number of responses to each data flow |
||
coredns_health_request_duration_seconds_bucket |
Histogram of health request duration (bucket) |
||
coredns_health_request_duration_seconds_count |
Histogram of health request duration (count) |
||
coredns_health_request_duration_seconds_sum |
Histogram of health request duration (sum) |
||
coredns_health_request_failures_total |
Number of health request failures |
||
coredns_hosts_reload_timestamp_seconds |
Timestamp of the last reload of the host file |
||
coredns_kubernetes_dns_programming_duration_seconds_bucket |
Histogram of DNS programming duration (bucket) |
||
coredns_kubernetes_dns_programming_duration_seconds_count |
Histogram of DNS programming duration (count) |
||
coredns_kubernetes_dns_programming_duration_seconds_sum |
Histogram of DNS programming duration (sum) |
||
coredns_local_localhost_requests_total |
Number of localhost requests |
||
coredns_nodecache_setup_errors_total |
Number of nodecache setup errors |
||
coredns_dns_response_rcode_count_total |
Number of responses for each Zone and Rcode |
||
coredns_dns_request_count_total |
Number of DNS requests |
||
coredns_dns_request_do_count_total |
Number of requests with the DNSSEC OK (DO) bit set |
||
coredns_dns_do_requests_total |
Number of requests with the DO bit set |
||
coredns_dns_request_type_count_total |
Number of requests for each Zone and Type |
||
coredns_panics_total |
Total number of panics |
||
coredns_plugin_enabled |
Whether a plugin is enabled |
||
coredns_reload_failed_total |
Number of last reload failures |
||
serviceMonitor/monitoring/kube-apiserver/0 |
apiserver |
aggregator_unavailable_apiservice |
Number of unavailable APIServices |
apiserver_admission_controller_admission_duration_seconds_bucket |
Processing delay of an Admission Controller |
||
apiserver_admission_webhook_admission_duration_seconds_bucket |
Processing delay of an Admission Webhook |
||
apiserver_admission_webhook_admission_duration_seconds_count |
Number of Admission Webhook processing requests |
||
apiserver_client_certificate_expiration_seconds_bucket |
Remaining validity period of the client certificate |
||
apiserver_client_certificate_expiration_seconds_count |
Remaining validity period of the client certificate |
||
apiserver_current_inflight_requests |
Number of read requests in process |
||
apiserver_request_duration_seconds_bucket |
Delay of the client's access to the APIServer |
||
apiserver_request_total |
Number of different requests to the APIServer |
||
go_goroutines |
Number of goroutines |
||
kubernetes_build_info |
Information to build Kubernetes |
||
process_cpu_seconds_total |
Total process CPU time |
||
process_resident_memory_bytes |
Size of the resident memory set for a process |
||
rest_client_requests_total |
Number of REST requests |
||
workqueue_adds_total |
Number of adds handled by a work queue |
||
workqueue_depth |
Depth of a work queue |
||
workqueue_queue_duration_seconds_bucket |
Duration when a task exists in the work queue |
||
aggregator_unavailable_apiservice_total |
Number of unavailable APIServices |
||
rest_client_request_duration_seconds_bucket |
Histogram of REST request duration |
||
serviceMonitor/monitoring/kubelet/0 |
kubelet |
kubelet_certificate_manager_client_expiration_renew_errors |
Number of certificate renewal errors |
kubelet_certificate_manager_client_ttl_seconds |
Time-to-live (TTL) of the Kubelet client certificate |
||
kubelet_cgroup_manager_duration_seconds_bucket |
Duration of the cgroup manager operations (bucket) |
||
kubelet_cgroup_manager_duration_seconds_count |
Duration of the cgroup manager operations (count) |
||
kubelet_node_config_error |
If a configuration-related error occurs on a node, the value of this metric is true (1). If there is no configuration-related error, the value is false (0). |
||
kubelet_node_name |
Node name. The value is always 1. |
||
kubelet_pleg_relist_duration_seconds_bucket |
Duration of relisting pods in PLEG (bucket) |
||
kubelet_pleg_relist_duration_seconds_count |
Duration of relisting pods in PLEG (count) |
||
kubelet_pleg_relist_interval_seconds_bucket |
Interval between relisting operations in PLEG (bucket) |
||
kubelet_pod_start_duration_seconds_count |
Time required for starting a single pod (count) |
||
kubelet_pod_start_duration_seconds_bucket |
Time required for starting a single pod (bucket) |
||
kubelet_pod_worker_duration_seconds_bucket |
Duration for synchronizing a single pod. Operation type: create, update, or sync |
||
kubelet_running_containers |
Number of running containers |
||
kubelet_running_pods |
Number of running pods |
||
kubelet_runtime_operations_duration_seconds_bucket |
Duration of the runtime operations (bucket) |
||
kubelet_runtime_operations_errors_total |
Number of runtime operation errors listed by operation type |
||
kubelet_runtime_operations_total |
Number of runtime operations listed by operation type |
||
kubelet_volume_stats_available_bytes |
Number of available bytes in a volume |
||
kubelet_volume_stats_capacity_bytes |
Capacity of the volume in bytes |
||
kubelet_volume_stats_inodes |
Total number of inodes in a volume |
||
kubelet_volume_stats_inodes_used |
Number of used inodes in a volume |
||
kubelet_volume_stats_used_bytes |
Number of used bytes in a volume |
||
storage_operation_duration_seconds_bucket |
Duration of each storage operation (bucket) |
||
storage_operation_duration_seconds_count |
Duration of each storage operation (count) |
||
storage_operation_errors_total |
Number of storage operation errors |
||
volume_manager_total_volumes |
Number of volumes in the Volume Manager |
||
rest_client_requests_total |
Number of HTTP client requests partitioned by status code, method, and host |
||
rest_client_request_duration_seconds_bucket |
Request delay (bucket) |
||
process_resident_memory_bytes |
Size of the resident memory set for a process |
||
process_cpu_seconds_total |
Total process CPU time |
||
go_goroutines |
Number of goroutines |
||
serviceMonitor/monitoring/kubelet/1 |
kubelet |
container_cpu_cfs_periods_total |
Number of elapsed enforcement period intervals |
container_cpu_cfs_throttled_periods_total |
Number of throttled period intervals |
||
container_cpu_cfs_throttled_seconds_total |
Total time duration the container has been throttled |
||
container_cpu_load_average_10s |
Value of container CPU load average over the last 10 seconds |
||
container_cpu_usage_seconds_total |
Cumulative CPU time consumed by a container in core-seconds |
||
container_file_descriptors |
Number of open file descriptors for a container |
||
container_fs_inodes_free |
Number of available inodes in a file system |
||
container_fs_inodes_total |
Number of inodes in a file system |
||
container_fs_io_time_seconds_total |
Cumulative seconds spent on doing I/Os by the disk or file system |
||
container_fs_limit_bytes |
Total disk or file system capacity that can be consumed by a container |
||
container_fs_read_seconds_total |
Cumulative number of seconds the container spent on reading disk or file system data |
||
container_fs_reads_bytes_total |
Cumulative amount of disk or file system data read by a container |
||
container_fs_reads_total |
Cumulative number of disk or file system reads completed by a container |
||
container_fs_usage_bytes |
File system usage |
||
container_fs_write_seconds_total |
Cumulative number of seconds the container spent on writing data to the disk or file system |
||
container_fs_writes_bytes_total |
Total amount of data written by a container to a disk or file system |
||
container_fs_writes_total |
Cumulative number of disk or file system writes completed by a container |
||
container_memory_cache |
Memory used for the page cache of a container |
||
container_memory_failcnt |
Number of memory usage hits limits |
||
container_memory_max_usage_bytes |
Maximum memory usage recorded for a container |
||
container_memory_rss |
Size of the resident memory set for a container |
||
container_memory_swap |
Container swap usage |
||
container_memory_usage_bytes |
Current memory usage of a container |
||
container_memory_working_set_bytes |
Memory usage of the working set of a container |
||
container_network_receive_bytes_total |
Total volume of data received by the container network |
||
container_network_receive_errors_total |
Cumulative number of errors encountered during reception |
||
container_network_receive_packets_dropped_total |
Cumulative number of packets dropped during reception |
||
container_network_receive_packets_total |
Cumulative number of packets received |
||
container_network_transmit_bytes_total |
Total volume of data transmitted on the container network |
||
container_network_transmit_errors_total |
Cumulative number of errors encountered during transmission |
||
container_network_transmit_packets_dropped_total |
Cumulative number of packets dropped during transmission |
||
container_network_transmit_packets_total |
Cumulative number of packets transmitted |
||
container_spec_cpu_quota |
CPU quota of the container |
||
container_spec_memory_limit_bytes |
Memory limit for the container |
||
machine_cpu_cores |
Number of logical CPU cores |
||
machine_memory_bytes |
Amount of memory |
||
serviceMonitor/monitoring/kube-state-metrics/0 |
kube-state-metrics-prom |
kube_cronjob_status_active |
Running cronjob |
kube_cronjob_info |
Cronjob information |
||
kube_cronjob_labels |
Label of a cronjob |
||
kube_configmap_info |
ConfigMap information |
||
kube_daemonset_created |
DaemonSet creation time |
||
kube_daemonset_status_current_number_scheduled |
Number of DaemonSets that are being scheduled |
||
kube_daemonset_status_desired_number_scheduled |
Number of DaemonSets expected to be scheduled |
||
kube_daemonset_status_number_available |
Number of nodes that should be running a DaemonSet pod and have at least one DaemonSet pod running and available |
||
kube_daemonset_status_number_misscheduled |
Number of nodes that are not expected to run a DaemonSet pod |
||
kube_daemonset_status_number_ready |
Number of nodes that should be running the DaemonSet pods and have one or more DaemonSet pods running and ready |
||
kube_daemonset_status_number_unavailable |
Number of nodes that should be running the DaemonSet pods but have none of the DaemonSet pods running and available |
||
kube_daemonset_status_updated_number_scheduled |
Number of nodes that are running an updated DaemonSet pod |
||
kube_deployment_created |
Deployment creation timestamp |
||
kube_deployment_labels |
Deployment labels |
||
kube_deployment_metadata_generation |
Sequence number representing a specific generation of the desired state |
||
kube_deployment_spec_replicas |
Number of desired replicas for a Deployment |
||
kube_deployment_spec_strategy_rollingupdate_max_unavailable |
Maximum number of unavailable replicas during a rolling update of a Deployment |
||
kube_deployment_status_observed_generation |
The generation observed by the Deployment controller |
||
kube_deployment_status_replicas |
Number of current replicas of a Deployment |
||
kube_deployment_status_replicas_available |
Number of available replicas per Deployment |
||
kube_deployment_status_replicas_ready |
Number of ready replicas per Deployment |
||
kube_deployment_status_replicas_unavailable |
Number of unavailable replicas per Deployment |
||
kube_deployment_status_replicas_updated |
Number of updated replicas per Deployment |
||
kube_job_info |
Information about the job |
||
kube_namespace_labels |
Namespace labels |
||
kube_node_labels |
Node labels |
||
kube_node_info |
Information about a node |
||
kube_node_spec_taint |
Taint of a node |
||
kube_node_spec_unschedulable |
Whether new pods can be scheduled to a node |
||
kube_node_status_allocatable |
Allocatable resources on a node |
||
kube_node_status_capacity |
Capacity for different resources on a node |
||
kube_node_status_condition |
Condition of a node |
||
kube_node_volcano_oversubscription_status |
Node oversubscription status |
||
kube_persistentvolume_status_phase |
Phase of a PV status |
||
kube_persistentvolumeclaim_status_phase |
Phase of a PVC status |
||
kube_persistentvolume_info |
Information about a PV |
||
kube_persistentvolumeclaim_info |
Information about a PVC |
||
kube_pod_container_info |
Information about a container running in the pod |
||
kube_pod_container_resource_limits |
Number of container resource limits |
||
kube_pod_container_resource_requests |
Number of container resource requests |
||
kube_pod_container_status_last_terminated_reason |
Last reason the container was in a terminated state |
||
kube_pod_container_status_ready |
Whether the container's readiness check succeeded |
||
kube_pod_container_status_restarts_total |
Number of container restarts |
||
kube_pod_container_status_running |
Whether the container is running. |
||
kube_pod_container_status_terminated |
Whether the container is terminated |
||
kube_pod_container_status_terminated_reason |
The reason why the container is in a terminated state |
||
kube_pod_container_status_waiting |
Whether the container is waiting |
||
kube_pod_container_status_waiting_reason |
The reason why the container is in the waiting state |
||
kube_pod_info |
Information about a pod |
||
kube_pod_labels |
Pod labels |
||
kube_pod_owner |
Information about the pod's owner |
||
kube_pod_status_phase |
Current phase of a pod |
||
kube_pod_status_ready |
Whether the pod is ready |
||
kube_secret_info |
Information about a secret |
||
kube_statefulset_created |
StatefulSet creation timestamp |
||
kube_statefulset_labels |
Information about StatefulSet labels |
||
kube_statefulset_metadata_generation |
Sequence number representing a specific generation of the desired state for a StatefulSet |
||
kube_statefulset_replicas |
Number of desired pods for a StatefulSet |
||
kube_statefulset_status_observed_generation |
The generation observed by the StatefulSet controller |
||
kube_statefulset_status_replicas |
Number of replicas per StatefulSet |
||
kube_statefulset_status_replicas_ready |
Number of ready replicas per StatefulSet |
||
kube_statefulset_status_replicas_updated |
Number of updated replicas per StatefulSet |
||
kube_job_spec_completions |
Desired number of successfully finished pods that should run with the job |
||
kube_job_status_failed |
Failed jobs |
||
kube_job_status_succeeded |
Successful jobs |
||
kube_node_status_allocatable_cpu_cores |
Number of allocatable CPU cores of a node |
||
kube_node_status_allocatable_memory_bytes |
Total allocatable memory of a node |
||
kube_replicaset_owner |
Information about the ReplicaSet's owner |
||
kube_resourcequota |
Information about resource quota |
||
kube_pod_spec_volumes_persistentvolumeclaims_info |
Information about the PVC associated with the pod |
||
serviceMonitor/monitoring/prometheus-lightweight/0 |
prometheus-lightweight |
vm_persistentqueue_blocks_dropped_total |
Number of dropped blocks in a send queue |
vm_persistentqueue_blocks_read_total |
Number of blocks read by a send queue |
||
vm_persistentqueue_blocks_written_total |
Number of blocks written to a send queue |
||
vm_persistentqueue_bytes_pending |
Number of pending bytes in a send queue |
||
vm_persistentqueue_bytes_read_total |
Number of bytes read by a send queue |
||
vm_persistentqueue_bytes_written_total |
Number of bytes written to a send queue |
||
vm_promscrape_active_scrapers |
Number of active scrapes |
||
vm_promscrape_conn_read_errors_total |
Number of read errors during scrapes |
||
vm_promscrape_conn_write_errors_total |
Number of write errors during scrapes |
||
vm_promscrape_max_scrape_size_exceeded_errors_total |
Number of failed scrapes due to the exceeded response size |
||
vm_promscrape_scrape_duration_seconds_sum |
Duration of scrapes (sum) |
||
vm_promscrape_scrape_duration_seconds_count |
Duration of scrapes (count) |
||
vm_promscrape_scrapes_total |
Number of scrapes |
||
vmagent_remotewrite_bytes_sent_total |
Number of bytes sent via a remote write |
||
vmagent_remotewrite_duration_seconds_sum |
Time required for a remote write (sum) |
||
vmagent_remotewrite_duration_seconds_count |
Time required for a remote write (count) |
||
vmagent_remotewrite_packets_dropped_total |
Number of dropped packets during a remote write |
||
vmagent_remotewrite_pending_data_bytes |
Number of pending bytes during a remote write |
||
vmagent_remotewrite_requests_total |
Number of requests of the remote write |
||
vmagent_remotewrite_retries_count_total |
Number of retries of the remote write |
||
go_goroutines |
Number of goroutines |
||
serviceMonitor/monitoring/node-exporter/0 |
node-exporter |
node_boot_time_seconds |
Node boot time |
node_context_switches_total |
Number of context switches |
||
node_cpu_seconds_total |
Seconds each CPU spent doing each type of work |
||
node_disk_io_now |
Number of I/Os in progress |
||
node_disk_io_time_seconds_total |
Total seconds spent doing I/Os |
||
node_disk_io_time_weighted_seconds_total |
The weighted number of seconds spent doing I/Os |
||
node_disk_read_bytes_total |
Number of bytes that are read |
||
node_disk_read_time_seconds_total |
Number of seconds spent by all reads |
||
node_disk_reads_completed_total |
Number of reads completed |
||
node_disk_write_time_seconds_total |
Number of seconds spent by all writes |
||
node_disk_writes_completed_total |
Number of writes completed |
||
node_disk_written_bytes_total |
Number of bytes that are written |
||
node_docker_thinpool_data_space_available |
Available data space of a docker thin pool |
||
node_docker_thinpool_metadata_space_available |
Available metadata space of a docker thin pool |
||
node_exporter_build_info |
Node exporter build information |
||
node_filefd_allocated |
Allocated file descriptors |
||
node_filefd_maximum |
Maximum number of file descriptors |
||
node_filesystem_avail_bytes |
File system space that is available for use |
||
node_filesystem_device_error |
Whether an error occurred while getting statistics for the given device |
||
node_filesystem_free_bytes |
Remaining space of a file system |
||
node_filesystem_readonly |
Read-only file system |
||
node_filesystem_size_bytes |
Consumed space of a file system |
||
node_forks_total |
Number of forks |
||
node_intr_total |
Number of interruptions that occurred |
||
node_load1 |
1-minute average CPU load |
||
node_load15 |
15-minute average CPU load |
||
node_load5 |
5-minute average CPU load |
||
node_memory_Buffers_bytes |
Memory of the node buffer |
||
node_memory_Cached_bytes |
Memory for the node page cache |
||
node_memory_MemAvailable_bytes |
Available memory of a node |
||
node_memory_MemFree_bytes |
Free memory of a node |
||
node_memory_MemTotal_bytes |
Total memory of a node |
||
node_network_receive_bytes_total |
Total amount of received data |
||
node_network_receive_drop_total |
Cumulative number of packets dropped during reception |
||
node_network_receive_errs_total |
Cumulative number of errors encountered during reception |
||
node_network_receive_packets_total |
Cumulative number of packets received |
||
node_network_transmit_bytes_total |
Total amount of transmitted data |
||
node_network_transmit_drop_total |
Cumulative number of dropped packets during transmission |
||
node_network_transmit_errs_total |
Cumulative number of errors encountered during transmission |
||
node_network_transmit_packets_total |
Cumulative number of packets transmitted |
||
node_procs_blocked |
Blocked processes |
||
node_procs_running |
Running processes |
||
node_sockstat_sockets_used |
Number of sockets in use |
||
node_sockstat_TCP_alloc |
Number of allocated TCP sockets |
||
node_sockstat_TCP_inuse |
Number of TCP sockets in use |
||
node_sockstat_TCP_orphan |
Number of orphaned TCP sockets |
||
node_sockstat_TCP_tw |
Number of TCP sockets in the TIME_WAIT state |
||
node_sockstat_UDPLITE_inuse |
Number of UDP-Lite sockets in use |
||
node_sockstat_UDP_inuse |
Number of UDP sockets in use |
||
node_sockstat_UDP_mem |
UDP socket buffer usage |
||
node_timex_offset_seconds |
Time offset |
||
node_timex_sync_status |
Synchronization status of node clocks |
||
node_uname_info |
Labeled system information as provided by the uname system call |
||
node_vmstat_oom_kill |
OOM kill in /proc/vmstat |
||
process_cpu_seconds_total |
Total process CPU time |
||
process_max_fds |
Maximum number of file descriptors of a process |
||
process_open_fds |
Opened file descriptors by a process |
||
process_resident_memory_bytes |
Size of the resident memory set for a process |
||
process_start_time_seconds |
Process start time |
||
process_virtual_memory_bytes |
Virtual memory size for a process |
||
process_virtual_memory_max_bytes |
Maximum virtual memory size for a process |
||
node_netstat_Tcp_ActiveOpens |
Number of TCP connections that directly change from the CLOSED state to the SYN-SENT state |
||
node_netstat_Tcp_PassiveOpens |
Number of TCP connections that directly change from the LISTEN state to the SYN-RCVD state |
||
node_netstat_Tcp_CurrEstab |
Number of TCP connections in the ESTABLISHED or CLOSE-WAIT state |
||
node_vmstat_pgmajfault |
Number of major faults per second in /proc/vmstat |
||
node_vmstat_pgpgout |
Number of page out between main memory and block device in /proc/vmstat |
||
node_vmstat_pgfault |
Number of page faults the system has made per second in /proc/vmstat |
||
node_vmstat_pgpgin |
Number of page in between main memory and block device in /proc/vmstat |
||
node_processes_max_processes |
PID limit value |
||
node_processes_pids |
Number of PIDs |
||
node_nf_conntrack_entries |
Number of currently allocated flow entries for connection tracking |
||
node_nf_conntrack_entries_limit |
Maximum size of a connection tracking table |
||
promhttp_metric_handler_requests_in_flight |
Number of metrics being processed |
||
go_goroutines |
Number of node exporter goroutines |
||
podMonitor/monitoring/nvidia-gpu-device-plugin/0 |
monitoring/nvidia-gpu-device-plugin |
cce_gpu_utilization |
GPU compute usage |
cce_gpu_memory_utilization |
GPU memory usage |
||
cce_gpu_encoder_utilization |
GPU encoding usage |
||
cce_gpu_decoder_utilization |
GPU decoding usage |
||
cce_gpu_utilization_process |
GPU compute usage of each process |
||
cce_gpu_memory_utilization_process |
GPU memory usage of each process |
||
cce_gpu_encoder_utilization_process |
GPU encoding usage of each process |
||
cce_gpu_decoder_utilization_process |
GPU decoding usage of each process |
||
cce_gpu_memory_used |
Used GPU memory |
||
cce_gpu_memory_total |
Total GPU memory |
||
cce_gpu_memory_free |
Free GPU memory |
||
cce_gpu_bar1_memory_used |
Used GPU BAR1 memory |
||
cce_gpu_bar1_memory_total |
Total GPU BAR1 memory |
||
cce_gpu_clock |
GPU clock frequency |
||
cce_gpu_memory_clock |
GPU memory frequency |
||
cce_gpu_graphics_clock |
GPU frequency |
||
cce_gpu_video_clock |
GPU video processor frequency |
||
cce_gpu_temperature |
GPU temperature |
||
cce_gpu_power_usage |
GPU power |
||
cce_gpu_total_energy_consumption |
Total GPU energy consumption |
||
cce_gpu_pcie_link_bandwidth |
GPU PCIe bandwidth |
||
cce_gpu_nvlink_bandwidth |
GPU NVLink bandwidth |
||
cce_gpu_pcie_throughput_rx |
GPU PCIe RX bandwidth |
||
cce_gpu_pcie_throughput_tx |
GPU PCIe TX bandwidth |
||
cce_gpu_nvlink_utilization_counter_rx |
GPU NVLink RX bandwidth |
||
cce_gpu_nvlink_utilization_counter_tx |
GPU NVLink TX bandwidth |
||
cce_gpu_retired_pages_sbe |
Number of GPU single-bit error isolation pages |
||
cce_gpu_retired_pages_dbe |
Number of GPU dual-bit error isolation pages |
||
xgpu_memory_total |
Total xGPU memory |
||
xgpu_memory_used |
Used xGPU memory |
||
xgpu_core_percentage_total |
Total xGPU compute |
||
xgpu_core_percentage_used |
Used xGPU compute |
||
gpu_schedule_policy |
There are three GPU modes specified by three values. The value 0 indicates the GPU memory isolation, compute sharing mode. The value 1 indicates the GPU memory and compute isolation mode. The value 2 indicates the default mode, indicating that the GPU is not virtualized. |
||
xgpu_device_health |
Health status of xGPU. The value 0 indicates that the xGPU is healthy, and the value 1 indicates that the xGPU is unhealthy. |
||
serviceMonitor/monitoring/prometheus-server/0 |
prometheus-server |
prometheus_build_info |
Information to build Prometheus |
prometheus_engine_query_duration_seconds |
Query time |
||
prometheus_engine_query_duration_seconds_count |
Number of queries |
||
prometheus_sd_discovered_targets |
Number of targets discovered by each job |
||
prometheus_remote_storage_bytes_total |
Number of bytes sent |
||
prometheus_remote_storage_enqueue_retries_total |
Number of retries for entering a queue |
||
prometheus_remote_storage_highest_timestamp_in_seconds |
Highest timestamp that has come into the remote storage via the Appender interface, in seconds since epoch |
||
prometheus_remote_storage_queue_highest_sent_timestamp_seconds |
Highest timestamp successfully sent by a remote write |
||
prometheus_remote_storage_samples_dropped_total |
Total number of samples read from the WAL but not sent to remote storage |
||
prometheus_remote_storage_samples_failed_total |
Number of samples that failed to be sent to remote storage |
||
prometheus_remote_storage_samples_in_total |
Number of samples read into remote storage |
||
prometheus_remote_storage_samples_pending |
Number of samples pending in shards to be sent to remote storage |
||
prometheus_remote_storage_samples_retried_total |
Number of samples which failed to be sent to remote storage but were retried |
||
prometheus_remote_storage_samples_total |
Total number of samples sent to remote storage |
||
prometheus_remote_storage_shard_capacity |
Capacity of each shard of the queue used for parallel sending to the remote storage |
||
prometheus_remote_storage_shards |
Number of shards used for parallel sending to the remote storage |
||
prometheus_remote_storage_shards_desired |
Number of shards that the queues shard calculation wants to run based on the rate of samples in vs. samples out |
||
prometheus_remote_storage_shards_max |
Maximum number of shards that the queue is allowed to run |
||
prometheus_remote_storage_shards_min |
Minimum number of shards that the queue is allowed to run |
||
prometheus_tsdb_wal_segment_current |
WAL segment index that TSDB is currently writing to |
||
prometheus_tsdb_head_chunks |
Number of chunks in the head block |
||
prometheus_tsdb_head_series |
Number of series in the head block |
||
prometheus_tsdb_head_samples_appended_total |
Number of appended samples |
||
prometheus_wal_watcher_current_segment |
Current segment the WAL watcher is reading records from |
||
prometheus_target_interval_length_seconds |
Actual intervals between scrapes |
||
prometheus_target_interval_length_seconds_count |
Actual intervals between scrapes (count) |
||
prometheus_target_interval_length_seconds_sum |
Actual intervals between scrapes (sum) |
||
prometheus_target_scrapes_exceeded_body_size_limit_total |
Number of scrapes that hit the body size limit |
||
prometheus_target_scrapes_exceeded_sample_limit_total |
Number of scrapes that hit the sample limit |
||
prometheus_target_scrapes_sample_duplicate_timestamp_total |
Number scraped samples with duplicate timestamps |
||
prometheus_target_scrapes_sample_out_of_bounds_total |
Number of samples rejected due to timestamp falling outside of the time bounds |
||
prometheus_target_scrapes_sample_out_of_order_total |
Number of out-of-order samples |
||
prometheus_target_sync_length_seconds |
Interval for synchronizing the scrape pool |
||
prometheus_target_sync_length_seconds_count |
Interval for synchronizing the scrape pool (count) |
||
prometheus_target_sync_length_seconds_sum |
Interval for synchronizing the scrape pool (sum) |
||
promhttp_metric_handler_requests_in_flight |
Number of metrics being processed |
||
promhttp_metric_handler_requests_total |
Number of metric processing times |
||
go_goroutines |
Number of goroutines |
||
podMonitor/monitoring/virtual-kubelet-pods/0 |
monitoring/virtual-kubelet-pods |
container_cpu_load_average_10s |
Value of container CPU load average over the last 10 seconds |
container_cpu_system_seconds_total |
Cumulative container CPU system time |
||
container_cpu_usage_seconds_total |
Cumulative CPU time consumed by a container in core-seconds |
||
container_cpu_user_seconds_total |
Usage of user CPU time |
||
container_cpu_cfs_periods_total |
Number of elapsed enforcement period intervals |
||
container_cpu_cfs_throttled_periods_total |
Number of throttled period intervals |
||
container_cpu_cfs_throttled_seconds_total |
Total time duration the container has been throttled |
||
container_fs_inodes_free |
Number of available inodes in a file system |
||
container_fs_usage_bytes |
File system usage |
||
container_fs_inodes_total |
Number of inodes in a file system |
||
container_fs_io_current |
Number of I/Os currently in progress in a disk or file system |
||
container_fs_io_time_seconds_total |
Cumulative seconds spent on doing I/Os by the disk or file system |
||
container_fs_io_time_weighted_seconds_total |
Cumulative weighted I/O time of a disk or file system |
||
container_fs_limit_bytes |
Total disk or file system capacity that can be consumed by a container |
||
container_fs_reads_bytes_total |
Cumulative amount of disk or file system data read by a container |
||
container_fs_read_seconds_total |
Cumulative number of seconds the container spent on reading disk or file system data |
||
container_fs_reads_merged_total |
Cumulative number of merged disk or file system reads made by the container. |
||
container_fs_reads_total |
Cumulative number of disk or file system reads completed by a container |
||
container_fs_sector_reads_total |
Cumulative number of disk or file system sector reads completed by a container |
||
container_fs_sector_writes_total |
Cumulative number of disk or file system sector writes completed by a container |
||
container_fs_writes_bytes_total |
Total amount of data written by a container to a disk or file system |
||
container_fs_write_seconds_total |
Cumulative number of seconds the container spent on writing data to the disk or file system |
||
container_fs_writes_merged_total |
Cumulative number of merged container writes to the disk or file system |
||
container_fs_writes_total |
Cumulative number of disk or file system writes completed by a container |
||
container_blkio_device_usage_total |
Blkio device bytes usage |
||
container_memory_failures_total |
Cumulative number of container memory allocation failures |
||
container_memory_failcnt |
Number of memory usage hits limits |
||
container_memory_cache |
Memory used for the page cache of a container |
||
container_memory_mapped_file |
Size of the container memory mapped file. |
||
container_memory_max_usage_bytes |
Maximum memory usage recorded for a container |
||
container_memory_rss |
Size of the resident memory set for a container |
||
container_memory_swap |
Container swap usage |
||
container_memory_usage_bytes |
Current memory usage of a container |
||
container_memory_working_set_bytes |
Memory usage of the working set of a container |
||
container_network_receive_bytes_total |
Total volume of data received by the container network |
||
container_network_receive_errors_total |
Cumulative number of errors encountered during reception |
||
container_network_receive_packets_dropped_total |
Cumulative number of packets dropped during reception |
||
container_network_receive_packets_total |
Cumulative number of packets received |
||
container_network_transmit_bytes_total |
Total volume of data transmitted on the container network |
||
container_network_transmit_errors_total |
Cumulative number of errors encountered during transmission |
||
container_network_transmit_packets_dropped_total |
Cumulative number of packets dropped during transmission |
||
container_network_transmit_packets_total |
Cumulative number of packets transmitted |
||
container_processes |
Number of processes running inside the container |
||
container_sockets |
Number of open sockets for the container |
||
container_file_descriptors |
Number of open file descriptors for a container |
||
container_threads |
Number of threads running inside the container |
||
container_threads_max |
Maximum number of threads allowed inside the container |
||
container_ulimits_soft |
Soft ulimit value of process 1 in the container. Unlimited if the value is -1, except priority and nice. |
||
container_tasks_state |
Number of tasks in the specified state, such as sleeping, running, stopped, uninterruptible, or ioawaiting |
||
container_spec_cpu_period |
CPU period of the container |
||
container_spec_cpu_shares |
CPU share of the container |
||
container_spec_cpu_quota |
CPU quota of the container |
||
container_spec_memory_limit_bytes |
Memory limit for the container |
||
container_spec_memory_reservation_limit_bytes |
Memory reservation limit for the container |
||
container_spec_memory_swap_limit_bytes |
Memory swap limit for the container |
||
container_start_time_seconds |
Running time of the container. |
||
container_last_seen |
Last time a container was seen by the exporter |
||
container_accelerator_memory_used_bytes |
GPU accelerator memory that is being used by the container |
||
container_accelerator_memory_total_bytes |
Total available memory of a GPU accelerator |
||
container_accelerator_duty_cycle |
Percentage of time when a GPU accelerator is actually running |
||
podMonitor/monitoring/everest-csi-controller/0 |
monitoring/everest-csi-controller |
everest_action_result_total |
Number of action results |
everest_function_duration_seconds_bucket |
Histogram of action duration (bucket) |
||
everest_function_duration_seconds_count |
Histogram of action duration (count) |
||
everest_function_duration_seconds_sum |
Histogram of action duration (sum) |
||
everest_function_duration_quantile_seconds |
Time quantile required by the action |
||
node_volume_read_completed_total |
Number of completed reads |
||
node_volume_read_merged_total |
Number of merged reads |
||
node_volume_read_bytes_total |
Total number of bytes read by a sector |
||
node_volume_read_time_milliseconds_total |
Total read duration |
||
node_volume_write_completed_total |
Number of completed writes |
||
node_volume_write_merged_total |
Number of merged writes |
||
node_volume_write_bytes_total |
Total number of bytes written into a sector |
||
node_volume_write_time_milliseconds_total |
Total write duration |
||
node_volume_io_now |
Number of ongoing I/Os |
||
node_volume_io_time_seconds_total |
Total I/O operation duration |
||
node_volume_capacity_bytes_available |
Available capacity |
||
node_volume_capacity_bytes_total |
Total capacity |
||
node_volume_capacity_bytes_used |
Used capacity |
||
node_volume_inodes_available |
Available inodes |
||
node_volume_inodes_total |
Total number of inodes |
||
node_volume_inodes_used |
Used inodes |
||
node_volume_read_transmissions_total |
Number of read transmission times |
||
node_volume_read_timeouts_total |
Number of read timeouts |
||
node_volume_read_sent_bytes_total |
Number of bytes read |
||
node_volume_read_queue_time_milliseconds_total |
Read queue waiting time |
||
node_volume_read_rtt_time_milliseconds_total |
Read RTT |
||
node_volume_write_transmissions_total |
Number of write transmissions |
||
node_volume_write_timeouts_total |
Number of write timeouts |
||
node_volume_write_queue_time_milliseconds_total |
Write queue waiting time |
||
node_volume_write_rtt_time_milliseconds_total |
Write RTT |
||
node_volume_localvolume_stats_capacity_bytes |
Local storage capacity |
||
node_volume_localvolume_stats_available_bytes |
Available local storage |
||
node_volume_localvolume_stats_used_bytes |
Used local storage |
||
node_volume_localvolume_stats_inodes |
Number of inodes for a local volume |
||
node_volume_localvolume_stats_inodes_used |
Used inodes for a local volume |
||
podMonitor/monitoring/nginx-ingress-controller/0 |
monitoring/nginx-ingress-controller |
nginx_ingress_controller_bytes_sent |
Number of bytes sent to the client |
nginx_ingress_controller_connect_duration_seconds |
Duration for connecting to the upstream server |
||
nginx_ingress_controller_header_duration_seconds |
Time required for receiving the first header from the upstream server |
||
nginx_ingress_controller_ingress_upstream_latency_seconds |
Upstream service latency |
||
nginx_ingress_controller_request_duration_seconds |
Time required for processing a request, in milliseconds |
||
nginx_ingress_controller_request_size |
Length of a request, including the request line, header, and body |
||
nginx_ingress_controller_requests |
Total number of HTTP requests processed by Nginx Ingress Controller since it starts |
||
nginx_ingress_controller_response_duration_seconds |
Time required for receiving the response from the upstream server |
||
nginx_ingress_controller_response_size |
Length of a response, including the request line, header, and body |
||
nginx_ingress_controller_nginx_process_connections |
Number of client connections in the active, read, write, or wait state |
||
nginx_ingress_controller_nginx_process_connections_total |
Total number of client connections in the accepted or handled state |
||
nginx_ingress_controller_nginx_process_cpu_seconds_total |
Total CPU time consumed by the Nginx process (unit: second) |
||
nginx_ingress_controller_nginx_process_num_procs |
Number of processes |
||
nginx_ingress_controller_nginx_process_oldest_start_time_seconds |
Start time in seconds since January 1, 1970 |
||
nginx_ingress_controller_nginx_process_read_bytes_total |
Number of bytes read |
||
nginx_ingress_controller_nginx_process_requests_total |
Total number of requests processed by Nginx since startup |
||
nginx_ingress_controller_nginx_process_resident_memory_bytes |
Resident memory usage of a process, that is, the actual physical memory usage |
||
nginx_ingress_controller_nginx_process_virtual_memory_bytes |
Virtual memory usage of a process, that is, the total memory allocated to the process, including the actual physical memory and virtual swap space |
||
nginx_ingress_controller_nginx_process_write_bytes_total |
Amount of data written by the Nginx process to disks or other devices for long-term storage |
||
nginx_ingress_controller_build_info |
Build information of Nginx Ingress Controller, including the version and compilation time |
||
nginx_ingress_controller_check_success |
Health check result of Nginx Ingress Controller. 1: Normal. 0: Abnormal |
||
nginx_ingress_controller_config_hash |
Configured hash value |
||
nginx_ingress_controller_config_last_reload_successful |
Whether the Nginx Ingress Controller configuration is successfully reloaded |
||
nginx_ingress_controller_config_last_reload_successful_timestamp_seconds |
Last timestamp when the Nginx Ingress Controller configuration was successfully reloaded |
||
nginx_ingress_controller_ssl_certificate_info |
Nginx Ingress Controller certificate information |
||
nginx_ingress_controller_success |
Cumulative number of reload operations of Nginx Ingress Controller |
||
nginx_ingress_controller_orphan_ingress |
Whether the ingress is isolated. 1: Isolated. 0: Not isolated. namespace indicates the namespace where the ingress is located, ingress indicates the ingress name. type indicates that the isolation type (options: no-service and no-endpoint). |
||
nginx_ingress_controller_admission_config_size |
Size of the admission controller configuration |
||
nginx_ingress_controller_admission_render_duration |
Rendering duration of the admission controller |
||
nginx_ingress_controller_admission_render_ingresses |
Length of ingresses rendered by the admission controller |
||
nginx_ingress_controller_admission_roundtrip_duration |
Time spent by the admission controller to process new events |
||
nginx_ingress_controller_admission_tested_duration |
Time spent on admission controller tests |
||
nginx_ingress_controller_admission_tested_ingresses |
Length of ingresses processed by the admission controller |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot