Basic Metrics: Container Metrics
This section describes the categories, names, and meanings of metrics reported to AOM from CCE's kube-prometheus-stack add-on or on-premises Kubernetes clusters.
Target Name |
Job Name |
Metric |
Description |
---|---|---|---|
|
coredns and node-local-dns |
coredns_build_info |
Information to build CoreDNS |
coredns_cache_entries |
Number of entries in the CoreDNS cache |
||
coredns_cache_size |
CoreDNS cache size |
||
coredns_cache_hits_total |
Number of CoreDNS cache hits |
||
coredns_cache_misses_total |
Number of CoreDNS cache misses |
||
coredns_cache_requests_total |
Total number of CoreDNS resolution requests in different dimensions |
||
coredns_dns_request_duration_seconds_bucket |
CoreDNS request latency |
||
coredns_dns_request_duration_seconds_count |
CoreDNS request processing time (seconds) |
||
coredns_dns_request_duration_seconds_sum |
Total CoreDNS request processing time (seconds) |
||
coredns_dns_request_size_bytes_bucket |
Size of the CoreDNS request in bytes |
||
coredns_dns_request_size_bytes_count |
CoreDNS request byte count |
||
coredns_dns_request_size_bytes_sum |
Total CoreDNS request bytes |
||
coredns_dns_requests_total |
Total number of CoreDNS requests |
||
coredns_dns_response_size_bytes_bucket |
Size of the returned CoreDNS response in bytes |
||
coredns_dns_response_size_bytes_count |
CoreDNS response byte count |
||
coredns_dns_response_size_bytes_sum |
Total CoreDNS response bytes |
||
coredns_dns_responses_total |
Total number of CoreDNS response codes |
||
coredns_forward_conn_cache_hits_total |
Total number of cache hits for each protocol and data flow |
||
coredns_forward_conn_cache_misses_total |
Total number of cache misses for each protocol and data flow |
||
coredns_forward_healthcheck_broken_total |
Total forwarding health check failures |
||
coredns_forward_healthcheck_failures_total |
Total forwarding health check faults |
||
coredns_forward_max_concurrent_rejects_total |
Total number of requests rejected due to excessive concurrent requests |
||
coredns_forward_request_duration_seconds_bucket |
CoreDNS forwarding request latency |
||
coredns_forward_request_duration_seconds_count |
CoreDNS forwarding request duration in seconds |
||
coredns_forward_request_duration_seconds_sum |
Total CoreDNS forwarding request duration in seconds |
||
coredns_forward_requests_total |
Total number of requests for each data flow |
||
coredns_forward_responses_total |
Total number of responses to each data flow |
||
coredns_health_request_duration_seconds_bucket |
CoreDNS health check request latency |
||
coredns_health_request_duration_seconds_count |
CoreDNS health check request duration in seconds |
||
coredns_health_request_duration_seconds_sum |
Total CoreDNS health check request duration in seconds |
||
coredns_health_request_failures_total |
Total number of failed CoreDNS health check requests |
||
coredns_hosts_reload_timestamp_seconds |
Timestamp of CoreDNS's last reload of the host file |
||
coredns_kubernetes_dns_programming_duration_seconds_bucket |
DNS programming latency |
||
coredns_kubernetes_dns_programming_duration_seconds_count |
DNS programming duration in seconds |
||
coredns_kubernetes_dns_programming_duration_seconds_sum |
Total DNS programming duration in seconds |
||
coredns_local_localhost_requests_total |
Total number of localhost requests processed by CoreDNS |
||
coredns_nodecache_setup_errors_total |
Total number of node cache plug-in setting errors |
||
coredns_dns_response_rcode_count_total |
Cumulative count of response codes |
||
coredns_dns_request_count_total |
Cumulative count of DNS requests made per zone, protocol, and family |
||
coredns_dns_request_do_count_total |
Cumulative count of requests with the DO bit set |
||
coredns_dns_do_requests_total |
Number of requests with the DO bit set |
||
coredns_dns_request_type_count_total |
Cumulative count of DNS requests per type |
||
coredns_panics_total |
Total number of CoreDNS abnormal exits |
||
coredns_plugin_enabled |
Whether a plugin is enabled in CoreDNS |
||
coredns_reload_failed_total |
Total number of configuration files that fail to be reloaded |
||
serviceMonitor/monitoring/kube-apiserver/0 |
apiserver |
aggregator_unavailable_apiservice |
Number of unavailable APIServices |
apiserver_admission_controller_admission_duration_seconds_bucket |
Processing delay of an admission controller |
||
apiserver_admission_webhook_admission_duration_seconds_bucket |
Processing delay of an admission webhook |
||
apiserver_admission_webhook_admission_duration_seconds_count |
Number of admission webhook processing requests |
||
apiserver_client_certificate_expiration_seconds_bucket |
Remaining validity period of the client certificate |
||
apiserver_client_certificate_expiration_seconds_count |
Remaining validity period of the client certificate |
||
apiserver_current_inflight_requests |
Number of read requests in process |
||
apiserver_request_duration_seconds_bucket |
Delay of the client's access to the APIServer |
||
apiserver_request_total |
Counter of API server requests broken out for code and other items |
||
go_goroutines |
Number of goroutines that exist |
||
kubernetes_build_info |
Information to build Kubernetes |
||
process_cpu_seconds_total |
Total process CPU time |
||
process_resident_memory_bytes |
Size of the resident memory set |
||
rest_client_requests_total |
Total number of HTTP requests, partitioned by status code and method |
||
workqueue_adds_total |
Total number of additions handled by a work queue |
||
workqueue_depth |
Current depth of a work queue |
||
workqueue_queue_duration_seconds_bucket |
Duration that a task stays in the current queue |
||
aggregator_unavailable_apiservice_total |
Number of unavailable APIServices |
||
rest_client_request_duration_seconds_bucket |
Number of HTTP requests, partitioned by status code and method |
||
serviceMonitor/monitoring/kubelet/0 |
kubelet |
kubelet_certificate_manager_client_expiration_renew_errors |
Number of certificate renewal errors |
kubelet_certificate_manager_client_ttl_seconds |
Time-to-live (TTL) of the Kubelet client certificate |
||
kubelet_cgroup_manager_duration_seconds_bucket |
Duration for destruction and update operations |
||
kubelet_cgroup_manager_duration_seconds_count |
Number of destruction and update operations |
||
kubelet_node_config_error |
If a configuration-related error occurs on a node, the value of this metric is true (1). If there is no configuration-related error, the value is false (0). |
||
kubelet_node_name |
Node name. The value is always 1. |
||
kubelet_pleg_relist_duration_seconds_bucket |
Duration for relisting pods in PLEG |
||
kubelet_pleg_relist_duration_seconds_count |
Duration in seconds for relisting pods in PLEG |
||
kubelet_pleg_relist_interval_seconds_bucket |
Interval between relisting operations in PLEG |
||
kubelet_pod_start_duration_seconds_count |
Number of pods that have been started |
||
kubelet_pod_start_duration_seconds_bucket |
Duration from the kubelet seeing a pod for the first time to the pod starting to run |
||
kubelet_pod_worker_duration_seconds_bucket |
Duration for synchronizing a single pod. |
||
kubelet_running_containers |
Number of running containers |
||
kubelet_running_pods |
Number of running pods |
||
kubelet_runtime_operations_duration_seconds_bucket |
Time of every operation |
||
kubelet_runtime_operations_errors_total |
Number of errors in operations at runtime level |
||
kubelet_runtime_operations_total |
Number of runtime operations of each type |
||
kubelet_volume_stats_available_bytes |
Number of available bytes in a volume |
||
kubelet_volume_stats_capacity_bytes |
Capacity in bytes of a volume |
||
kubelet_volume_stats_inodes |
Maximum number of inodes in a volume |
||
kubelet_volume_stats_inodes_used |
Number of used inodes in a volume |
||
kubelet_volume_stats_used_bytes |
Number of used bytes in a volume |
||
storage_operation_duration_seconds_bucket |
Duration for each storage operation |
||
storage_operation_duration_seconds_count |
Number of storage operations |
||
storage_operation_errors_total |
Number of storage operation errors |
||
volume_manager_total_volumes |
Number of volumes in Volume Manager |
||
rest_client_requests_total |
Total number of HTTP requests, partitioned by status code and method |
||
rest_client_request_duration_seconds_bucket |
Number of HTTP requests, partitioned by status code and method |
||
process_resident_memory_bytes |
Size of the resident memory set |
||
process_cpu_seconds_total |
Total process CPU time |
||
go_goroutines |
Number of goroutines that exist |
||
serviceMonitor/monitoring/kubelet/1 |
kubelet |
container_cpu_cfs_periods_total |
Total number of elapsed enforcement periods |
container_cpu_cfs_throttled_periods_total |
Number of throttled periods |
||
container_cpu_cfs_throttled_seconds_total |
Total duration a container has been throttled |
||
container_cpu_load_average_10s |
Value of container CPU load average over the last 10 seconds |
||
container_cpu_usage_seconds_total |
Total CPU time consumed |
||
container_file_descriptors |
Number of open file descriptors for a container |
||
container_fs_inodes_free |
Number of available inodes in a file system |
||
container_fs_inodes_total |
Total number of inodes in a file system |
||
container_fs_io_time_seconds_total |
Cumulative time spent on doing I/Os by the disk or file system |
||
container_fs_limit_bytes |
Total disk or file system capacity that can be consumed by a container |
||
container_fs_read_seconds_total |
Total time a container spent on reading disk or file system data |
||
container_fs_reads_bytes_total |
Cumulative amount of disk or file system data read by a container |
||
container_fs_reads_total |
Cumulative number of disk or file system reads completed by a container |
||
container_fs_usage_bytes |
File system usage |
||
container_fs_write_seconds_total |
Total time a container spent on writing data to the disk or file system |
||
container_fs_writes_bytes_total |
Total amount of data written by a container to a disk or file system |
||
container_fs_writes_total |
Cumulative number of disk or file system writes completed by a container |
||
container_memory_cache |
Memory used for the page cache of a container |
||
container_memory_failcnt |
Number of memory usage hits limits |
||
container_memory_max_usage_bytes |
Maximum memory usage recorded for a container |
||
container_memory_rss |
Size of the resident memory set for a container |
||
container_memory_swap |
Container swap memory usage |
||
container_memory_usage_bytes |
Current memory usage of a container |
||
container_memory_working_set_bytes |
Memory usage of the working set of a container |
||
container_network_receive_bytes_total |
Total volume of data received by a container network |
||
container_network_receive_errors_total |
Cumulative number of errors encountered during reception |
||
container_network_receive_packets_dropped_total |
Cumulative number of packets dropped during reception |
||
container_network_receive_packets_total |
Cumulative number of packets received |
||
container_network_transmit_bytes_total |
Total volume of data transmitted on a container network |
||
container_network_transmit_errors_total |
Cumulative number of errors encountered during transmission |
||
container_network_transmit_packets_dropped_total |
Cumulative number of packets dropped during transmission |
||
container_network_transmit_packets_total |
Cumulative number of packets transmitted |
||
container_spec_cpu_quota |
CPU quota of a container |
||
container_spec_memory_limit_bytes |
Memory limit for a container |
||
machine_cpu_cores |
Number of CPU cores of the physical machine or VM |
||
machine_memory_bytes |
Total memory size of the physical machine or VM |
||
serviceMonitor/monitoring/kube-state-metrics/0 |
kube-state-metrics-prom |
kube_cronjob_status_active |
Whether the cronjob is actively running jobs |
kube_cronjob_info |
Cronjob information |
||
kube_cronjob_labels |
Label of a cronjob |
||
kube_configmap_info |
ConfigMap information |
||
kube_daemonset_created |
DaemonSet creation time |
||
kube_daemonset_status_current_number_scheduled |
Number of DaemonSets that are being scheduled |
||
kube_daemonset_status_desired_number_scheduled |
Number of DaemonSets expected to be scheduled |
||
kube_daemonset_status_number_available |
Number of nodes that should be running a DaemonSet pod and have at least one DaemonSet pod running and available |
||
kube_daemonset_status_number_misscheduled |
Number of nodes that are not expected to run a DaemonSet pod |
||
kube_daemonset_status_number_ready |
Number of nodes that should be running the DaemonSet pods and have one or more DaemonSet pods running and ready |
||
kube_daemonset_status_number_unavailable |
Number of nodes that should be running the DaemonSet pods but have none of the DaemonSet pods running and available |
||
kube_daemonset_status_updated_number_scheduled |
Number of nodes that are running an updated DaemonSet pod |
||
kube_deployment_created |
Deployment creation timestamp |
||
kube_deployment_labels |
Deployment labels |
||
kube_deployment_metadata_generation |
Sequence number representing a specific generation of the desired state for a Deployment |
||
kube_deployment_spec_replicas |
Number of desired replicas for a Deployment |
||
kube_deployment_spec_strategy_rollingupdate_max_unavailable |
Maximum number of unavailable replicas during a rolling update of a Deployment |
||
kube_deployment_status_observed_generation |
The generation observed by the Deployment controller |
||
kube_deployment_status_replicas |
Number of current replicas of a Deployment |
||
kube_deployment_status_replicas_available |
Number of available replicas per Deployment |
||
kube_deployment_status_replicas_ready |
Number of ready replicas per Deployment |
||
kube_deployment_status_replicas_unavailable |
Number of unavailable replicas per Deployment |
||
kube_deployment_status_replicas_updated |
Number of updated replicas per Deployment |
||
kube_job_info |
Job information |
||
kube_namespace_labels |
Namespace labels |
||
kube_node_labels |
Node labels |
||
kube_node_info |
Node information |
||
kube_node_spec_taint |
Taint of a node |
||
kube_node_spec_unschedulable |
Whether new pods can be scheduled to a node |
||
kube_node_status_allocatable |
Allocatable resources on a node |
||
kube_node_status_capacity |
Capacity for different resources on a node |
||
kube_node_status_condition |
Node status condition |
||
kube_node_volcano_oversubscription_status |
Node oversubscription status |
||
kube_persistentvolume_status_phase |
PV status |
||
kube_persistentvolumeclaim_status_phase |
PVC status |
||
kube_persistentvolume_info |
PV information |
||
kube_persistentvolumeclaim_info |
PVC information |
||
kube_pod_container_info |
Information about a container running in the pod |
||
kube_pod_container_resource_limits |
Container resource limits |
||
kube_pod_container_resource_requests |
Number of resources requested by a container |
||
kube_pod_container_status_last_terminated_reason |
The last reason a container was in terminated state |
||
kube_pod_container_status_ready |
Whether a container is in ready state |
||
kube_pod_container_status_restarts_total |
Number of container restarts |
||
kube_pod_container_status_running |
Whether a container is in running state |
||
kube_pod_container_status_terminated |
Whether a container is in terminated state |
||
kube_pod_container_status_terminated_reason |
The reason a container is in terminated state |
||
kube_pod_container_status_waiting |
Whether a container is in waiting state |
||
kube_pod_container_status_waiting_reason |
The reason a container is in waiting state |
||
kube_pod_info |
Pod information |
||
kube_pod_labels |
Pod labels |
||
kube_pod_owner |
Object to which the pod belongs |
||
kube_pod_status_phase |
Phase of the pod |
||
kube_pod_status_ready |
Whether the pod is in ready state |
||
kube_secret_info |
Secret information |
||
kube_statefulset_created |
StatefulSet creation timestamp |
||
kube_statefulset_labels |
Information about StatefulSet labels |
||
kube_statefulset_metadata_generation |
Sequence number representing a specific generation of the desired state for a StatefulSet |
||
kube_statefulset_replicas |
Number of desired pods for a StatefulSet |
||
kube_statefulset_status_observed_generation |
Generation observed by the StatefulSet controller |
||
kube_statefulset_status_replicas |
Number of stateful replicas in a StatefulSet |
||
kube_statefulset_status_replicas_ready |
Number of ready replicas in a StatefulSet |
||
kube_statefulset_status_replicas_updated |
Number of updated replicas in a StatefulSet |
||
kube_job_spec_completions |
Desired number of successfully finished pods that should run with the job |
||
kube_job_status_failed |
Failed jobs |
||
kube_job_status_succeeded |
Successful jobs |
||
kube_node_status_allocatable_cpu_cores |
Number of allocatable CPU cores of a node |
||
kube_node_status_allocatable_memory_bytes |
Total allocatable memory of a node |
||
kube_replicaset_owner |
ReplicaSet owner. |
||
kube_resourcequota |
Resource quota |
||
kube_pod_spec_volumes_persistentvolumeclaims_info |
Information about the PVC associated with the pod |
||
serviceMonitor/monitoring/prometheus-lightweight/0 |
prometheus-lightweight |
vm_persistentqueue_blocks_dropped_total |
Total number of dropped blocks in a send queue |
vm_persistentqueue_blocks_read_total |
Total number of blocks read by a send queue |
||
vm_persistentqueue_blocks_written_total |
Total number of blocks written to a send queue |
||
vm_persistentqueue_bytes_pending |
Number of pending bytes in a send queue |
||
vm_persistentqueue_bytes_read_total |
Total number of bytes read by a send queue |
||
vm_persistentqueue_bytes_written_total |
Total number of bytes written to a send queue |
||
vm_promscrape_active_scrapers |
Number of collected shards |
||
vm_promscrape_conn_read_errors_total |
Total number of read errors during scrapes |
||
vm_promscrape_conn_write_errors_total |
Total number of write errors during scrapes |
||
vm_promscrape_max_scrape_size_exceeded_errors_total |
Total number of scrapes failed because responses exceed the size limit |
||
vm_promscrape_scrape_duration_seconds_sum |
Time required for the scrape |
||
vm_promscrape_scrape_duration_seconds_count |
Total time required for the scrape |
||
vm_promscrape_scrapes_total |
Number of scrapes |
||
vmagent_remotewrite_bytes_sent_total |
Total number of bytes sent through remote write |
||
vmagent_remotewrite_duration_seconds_sum |
Time consumed by remote writes |
||
vmagent_remotewrite_duration_seconds_count |
Total time consumed by remote writes |
||
vmagent_remotewrite_packets_dropped_total |
Total number of dropped packets during remote write |
||
vmagent_remotewrite_pending_data_bytes |
Number of pending bytes during remote write |
||
vmagent_remotewrite_requests_total |
Total number of remote write requests |
||
vmagent_remotewrite_retries_count_total |
Total number of remote write retries |
||
go_goroutines |
Number of goroutines that exist |
||
serviceMonitor/monitoring/node-exporter/0 |
node-exporter |
node_boot_time_seconds |
Node boot time |
node_context_switches_total |
Number of context switches |
||
node_cpu_seconds_total |
Seconds the CPUs spent in each mode |
||
node_disk_io_now |
Number of I/Os in progress |
||
node_disk_io_time_seconds_total |
Total seconds spent doing I/Os |
||
node_disk_io_time_weighted_seconds_total |
The weighted time spent doing I/Os |
||
node_disk_read_bytes_total |
Number of bytes that are read |
||
node_disk_read_time_seconds_total |
Number of seconds spent by all reads |
||
node_disk_reads_completed_total |
Number of reads completed |
||
node_disk_write_time_seconds_total |
Number of seconds spent by all writes |
||
node_disk_writes_completed_total |
Number of writes completed |
||
node_disk_written_bytes_total |
Number of bytes that are written |
||
node_docker_thinpool_data_space_available |
Available data space of a Docker thin pool |
||
node_docker_thinpool_metadata_space_available |
Available metadata space of a Docker thin pool |
||
node_exporter_build_info |
Node Exporter build information |
||
node_filefd_allocated |
Allocated file descriptors |
||
node_filefd_maximum |
Maximum number of file descriptors |
||
node_filesystem_avail_bytes |
File system space that is available for use |
||
node_filesystem_device_error |
Error in the mounted file system device |
||
node_filesystem_free_bytes |
Remaining space of a file system |
||
node_filesystem_readonly |
Read-only file system |
||
node_filesystem_size_bytes |
Consumed space of a file system |
||
node_forks_total |
Number of forks |
||
node_intr_total |
Number of interruptions that occurred |
||
node_load1 |
1-minute average CPU load |
||
node_load15 |
15-minute average CPU load |
||
node_load5 |
5-minute average CPU load |
||
node_memory_Buffers_bytes |
Memory of the node buffer |
||
node_memory_Cached_bytes |
Memory for the node page cache |
||
node_memory_MemAvailable_bytes |
Available memory of a node |
||
node_memory_MemFree_bytes |
Free memory of a node |
||
node_memory_MemTotal_bytes |
Total memory of a node |
||
node_network_receive_bytes_total |
Total amount of received data |
||
node_network_receive_drop_total |
Total number of packets dropped during reception |
||
node_network_receive_errs_total |
Total number of errors encountered during reception |
||
node_network_receive_packets_total |
Total number of packets received |
||
node_network_transmit_bytes_total |
Total number of sent bytes |
||
node_network_transmit_drop_total |
Total number of dropped packets |
||
node_network_transmit_errs_total |
Total number of errors encountered during transmission |
||
node_network_transmit_packets_total |
Total number of packets sent |
||
node_procs_blocked |
Blocked processes |
||
node_procs_running |
Running processes |
||
node_sockstat_sockets_used |
Number of sockets in use |
||
node_sockstat_TCP_alloc |
Number of allocated TCP sockets |
||
node_sockstat_TCP_inuse |
Number of TCP sockets in use |
||
node_sockstat_TCP_orphan |
Number of orphaned TCP sockets |
||
node_sockstat_TCP_tw |
Number of TCP sockets in the TIME_WAIT state |
||
node_sockstat_UDPLITE_inuse |
Number of UDP-Lite sockets in use |
||
node_sockstat_UDP_inuse |
Number of UDP sockets in use |
||
node_sockstat_UDP_mem |
UDP socket buffer usage |
||
node_timex_offset_seconds |
Time offset |
||
node_timex_sync_status |
Synchronization status of node clocks |
||
node_uname_info |
System kernel information |
||
node_vmstat_oom_kill |
Number of processes terminated due to insufficient memory |
||
process_cpu_seconds_total |
Total process CPU time |
||
process_max_fds |
Maximum number of file descriptors of a process |
||
process_open_fds |
Opened file descriptors by a process |
||
process_resident_memory_bytes |
Size of the resident memory set |
||
process_start_time_seconds |
Process start time |
||
process_virtual_memory_bytes |
Virtual memory size |
||
process_virtual_memory_max_bytes |
Maximum available virtual memory capacity |
||
node_netstat_Tcp_ActiveOpens |
Number of TCP connections that directly change from the CLOSED state to the SYN-SENT state |
||
node_netstat_Tcp_PassiveOpens |
Number of TCP connections that directly change from the LISTEN state to the SYN-RCVD state |
||
node_netstat_Tcp_CurrEstab |
Number of TCP connections in the ESTABLISHED or CLOSE-WAIT state |
||
node_vmstat_pgmajfault |
Number of major page faults in vmstat |
||
node_vmstat_pgpgout |
Number of page out in vmstat |
||
node_vmstat_pgfault |
Number of page faults in vmstat |
||
node_vmstat_pgpgin |
Number of page in in vmstat |
||
node_processes_max_processes |
Maximum number of processes |
||
node_processes_pids |
Number of PIDs |
||
node_nf_conntrack_entries |
Number of currently allocated flow entries for connection tracking |
||
node_nf_conntrack_entries_limit |
Maximum size of a connection tracking table |
||
promhttp_metric_handler_requests_in_flight |
Number of metrics being processed |
||
go_goroutines |
Number of goroutines that exist |
||
node_filesystem_files |
Number of files in the file system on the node |
||
node_filesystem_files_free |
Number of available files in the file system on the node |
||
podMonitor/monitoring/nvidia-gpu-device-plugin/0 |
monitoring/nvidia-gpu-device-plugin |
cce_gpu_utilization |
GPU compute usage |
cce_gpu_memory_utilization |
GPU memory usage |
||
cce_gpu_encoder_utilization |
GPU encoding usage |
||
cce_gpu_decoder_utilization |
GPU decoding usage |
||
cce_gpu_utilization_process |
GPU compute usage of each process |
||
cce_gpu_memory_utilization_process |
GPU memory usage of each process |
||
cce_gpu_encoder_utilization_process |
GPU encoding usage of each process |
||
cce_gpu_decoder_utilization_process |
GPU decoding usage of each process |
||
cce_gpu_memory_used |
Used GPU memory |
||
cce_gpu_memory_total |
Total GPU memory |
||
cce_gpu_memory_free |
Free GPU memory |
||
cce_gpu_bar1_memory_used |
Used GPU BAR1 memory |
||
cce_gpu_bar1_memory_total |
Total GPU BAR1 memory |
||
cce_gpu_clock |
GPU clock frequency |
||
cce_gpu_memory_clock |
GPU memory frequency |
||
cce_gpu_graphics_clock |
GPU frequency |
||
cce_gpu_video_clock |
GPU video processor frequency |
||
cce_gpu_temperature |
GPU temperature |
||
cce_gpu_power_usage |
GPU power |
||
cce_gpu_total_energy_consumption |
Total GPU energy consumption |
||
cce_gpu_pcie_link_bandwidth |
GPU PCIe bandwidth |
||
cce_gpu_nvlink_bandwidth |
GPU NVLink bandwidth |
||
cce_gpu_pcie_throughput_rx |
GPU PCIe RX bandwidth |
||
cce_gpu_pcie_throughput_tx |
GPU PCIe TX bandwidth |
||
cce_gpu_nvlink_utilization_counter_rx |
GPU NVLink RX bandwidth |
||
cce_gpu_nvlink_utilization_counter_tx |
GPU NVLink TX bandwidth |
||
cce_gpu_retired_pages_sbe |
Number of isolated GPU memory pages with single-bit errors |
||
cce_gpu_retired_pages_dbe |
Number of isolated GPU memory pages with dual-bit errors |
||
xgpu_memory_total |
Total xGPU memory |
||
xgpu_memory_used |
Used xGPU memory |
||
xgpu_core_percentage_total |
Total xGPU compute |
||
xgpu_core_percentage_used |
Used xGPU compute |
||
gpu_schedule_policy |
There are three GPU modes. 0: GPU memory isolation, compute sharing mode. 1: GPU memory and compute isolation mode. 2: default mode, indicating that the GPU is not virtualized. |
||
xgpu_device_health |
Health status of xGPU. 0: xGPU is healthy. 1: xGPU is unhealthy. |
||
serviceMonitor/monitoring/prometheus-server/0 |
prometheus-server |
prometheus_build_info |
Prometheus build information |
prometheus_engine_query_duration_seconds |
Time for query, in seconds |
||
prometheus_engine_query_duration_seconds_count |
Number of queries |
||
prometheus_sd_discovered_targets |
Number of metrics collected by different targets |
||
prometheus_remote_storage_bytes_total |
Total number of bytes of data (non-metadata) sent by the queue after compression |
||
prometheus_remote_storage_enqueue_retries_total |
Number of retries upon enqueuing failed due to full shard queue |
||
prometheus_remote_storage_highest_timestamp_in_seconds |
Latest timestamp in the remote storage |
||
prometheus_remote_storage_queue_highest_sent_timestamp_seconds |
Highest timestamp successfully sent by remote storage |
||
prometheus_remote_storage_samples_dropped_total |
Number of samples dropped before being sent to remote storage |
||
prometheus_remote_storage_samples_failed_total |
Number of samples that failed to be sent to remote storage |
||
prometheus_remote_storage_samples_in_total |
Number of samples sent to remote storage |
||
prometheus_remote_storage_samples_pending |
Number of samples pending in shards to be sent to remote storage |
||
prometheus_remote_storage_samples_retried_total |
Number of samples which failed to be sent to remote storage but were retried |
||
prometheus_remote_storage_samples_total |
Total number of samples sent to remote storage |
||
prometheus_remote_storage_shard_capacity |
Capacity of each shard of the queue used for parallel sending to the remote storage |
||
prometheus_remote_storage_shards |
Number of shards used for parallel sending to the remote storage |
||
prometheus_remote_storage_shards_desired |
Number of shards that the queues shard calculation wants to run based on the rate of samples in vs. samples out |
||
prometheus_remote_storage_shards_max |
Maximum number of shards that the queue is allowed to run |
||
prometheus_remote_storage_shards_min |
Minimum number of shards that the queue is allowed to run |
||
prometheus_tsdb_wal_segment_current |
WAL segment index that TSDB is currently writing to |
||
prometheus_tsdb_head_chunks |
Number of chunks in the head block |
||
prometheus_tsdb_head_series |
Number of time series stored in the head |
||
prometheus_tsdb_head_samples_appended_total |
Number of appended samples |
||
prometheus_wal_watcher_current_segment |
Current segment the WAL watcher is reading records from |
||
prometheus_target_interval_length_seconds |
Metric collection interval |
||
prometheus_target_interval_length_seconds_count |
Number of metric collection intervals |
||
prometheus_target_interval_length_seconds_sum |
Sum of metric collection intervals |
||
prometheus_target_scrapes_exceeded_body_size_limit_total |
Number of scrapes that hit the body size limit |
||
prometheus_target_scrapes_exceeded_sample_limit_total |
Number of scrapes that hit the sample limit |
||
prometheus_target_scrapes_sample_duplicate_timestamp_total |
Number of scraped samples with duplicate timestamps |
||
prometheus_target_scrapes_sample_out_of_bounds_total |
Number of samples rejected due to timestamp falling outside of the time bounds |
||
prometheus_target_scrapes_sample_out_of_order_total |
Number of out-of-order samples |
||
prometheus_target_sync_length_seconds |
Target synchronization interval |
||
prometheus_target_sync_length_seconds_count |
Number of target synchronization intervals |
||
prometheus_target_sync_length_seconds_sum |
Sum of target synchronization intervals |
||
promhttp_metric_handler_requests_in_flight |
Current number of scrapes being served |
||
promhttp_metric_handler_requests_total |
Total scrapes |
||
go_goroutines |
Number of goroutines that exist |
||
podMonitor/monitoring/virtual-kubelet-pods/0 |
monitoring/virtual-kubelet-pods |
container_cpu_load_average_10s |
Value of container CPU load average over the last 10 seconds |
container_cpu_system_seconds_total |
Cumulative CPU time of a container system |
||
container_cpu_usage_seconds_total |
Cumulative CPU time consumed by a container in core-seconds |
||
container_cpu_user_seconds_total |
Cumulative CPU time of a user |
||
container_cpu_cfs_periods_total |
Number of elapsed enforcement period intervals |
||
container_cpu_cfs_throttled_periods_total |
Number of throttled period intervals |
||
container_cpu_cfs_throttled_seconds_total |
Total duration a container has been throttled |
||
container_fs_inodes_free |
Number of available inodes in a file system |
||
container_fs_usage_bytes |
File system usage |
||
container_fs_inodes_total |
Number of inodes in a file system |
||
container_fs_io_current |
Number of I/Os currently in progress in a disk or file system |
||
container_fs_io_time_seconds_total |
Cumulative time spent on doing I/Os by the disk or file system |
||
container_fs_io_time_weighted_seconds_total |
Cumulative weighted I/O time of a disk or file system |
||
container_fs_limit_bytes |
Total disk or file system capacity that can be consumed by a container |
||
container_fs_reads_bytes_total |
Cumulative amount of disk or file system data read by a container |
||
container_fs_read_seconds_total |
Time a container spent on reading disk or file system data |
||
container_fs_reads_merged_total |
Cumulative number of merged disk or file system reads made by a container |
||
container_fs_reads_total |
Cumulative number of disk or file system reads completed by a container |
||
container_fs_sector_reads_total |
Cumulative number of disk or file system sector reads completed by a container |
||
container_fs_sector_writes_total |
Cumulative number of disk or file system sector writes completed by a container |
||
container_fs_writes_bytes_total |
Total amount of data written by a container to a disk or file system |
||
container_fs_write_seconds_total |
Time a container spent on writing data to the disk or file system |
||
container_fs_writes_merged_total |
Cumulative number of merged container writes to the disk or file system |
||
container_fs_writes_total |
Cumulative number of disk or file system writes completed by a container |
||
container_blkio_device_usage_total |
Blkio device bytes usage |
||
container_memory_failures_total |
Cumulative number of container memory allocation failures |
||
container_memory_failcnt |
Number of memory usage hits limits |
||
container_memory_cache |
Memory used for the page cache of a container |
||
container_memory_mapped_file |
Size of a container memory mapped file |
||
container_memory_max_usage_bytes |
Maximum memory usage recorded for a container |
||
container_memory_rss |
Size of the resident memory set for a container |
||
container_memory_swap |
Container swap usage |
||
container_memory_usage_bytes |
Current memory usage of a container |
||
container_memory_working_set_bytes |
Memory usage of the working set of a container |
||
container_network_receive_bytes_total |
Total volume of data received by a container network |
||
container_network_receive_errors_total |
Cumulative number of errors encountered during reception |
||
container_network_receive_packets_dropped_total |
Cumulative number of packets dropped during reception |
||
container_network_receive_packets_total |
Cumulative number of packets received |
||
container_network_transmit_bytes_total |
Total volume of data transmitted on a container network |
||
container_network_transmit_errors_total |
Cumulative number of errors encountered during transmission |
||
container_network_transmit_packets_dropped_total |
Cumulative number of packets dropped during transmission |
||
container_network_transmit_packets_total |
Cumulative number of packets transmitted |
||
container_processes |
Number of processes running inside a container |
||
container_sockets |
Number of open sockets for a container |
||
container_file_descriptors |
Number of open file descriptors for a container |
||
container_threads |
Number of threads running inside a container |
||
container_threads_max |
Maximum number of threads allowed inside a container |
||
container_ulimits_soft |
Soft ulimit value of process 1 in a container Unlimited if the value is -1, except priority and nice. |
||
container_tasks_state |
Number of tasks in the specified state, such as sleeping, running, stopped, uninterruptible, or ioawaiting |
||
container_spec_cpu_period |
CPU period of a container |
||
container_spec_cpu_shares |
CPU share of a container |
||
container_spec_cpu_quota |
CPU quota of a container |
||
container_spec_memory_limit_bytes |
Memory limit for a container |
||
container_spec_memory_reservation_limit_bytes |
Memory reservation limit for a container |
||
container_spec_memory_swap_limit_bytes |
Memory swap limit for a container |
||
container_start_time_seconds |
Running time of a container |
||
container_last_seen |
Last time a container was seen by the exporter |
||
container_accelerator_memory_used_bytes |
GPU accelerator memory that is being used by a container |
||
container_accelerator_memory_total_bytes |
Total available memory of a GPU accelerator |
||
container_accelerator_duty_cycle |
Percentage of time when a GPU accelerator is actually running |
||
podMonitor/monitoring/everest-csi-controller/0 |
monitoring/everest-csi-controller |
everest_action_result_total |
Invoking of different functions |
everest_function_duration_seconds_bucket |
Number of times that different functions are executed at different time |
||
everest_function_duration_seconds_count |
Number of invoking times of different functions |
||
everest_function_duration_seconds_sum |
Total invoking time of different functions |
||
everest_function_duration_quantile_seconds |
Time quantile required for invoking different functions |
||
node_volume_read_completed_total |
Number of completed reads |
||
node_volume_read_merged_total |
Number of merged reads |
||
node_volume_read_bytes_total |
Total number of bytes read by a sector |
||
node_volume_read_time_milliseconds_total |
Total read duration |
||
node_volume_write_completed_total |
Number of completed writes |
||
node_volume_write_merged_total |
Number of merged writes |
||
node_volume_write_bytes_total |
Total number of bytes written into a sector |
||
node_volume_write_time_milliseconds_total |
Total write duration |
||
node_volume_io_now |
Number of ongoing I/Os |
||
node_volume_io_time_seconds_total |
Total duration of I/O operations |
||
node_volume_capacity_bytes_available |
Available capacity |
||
node_volume_capacity_bytes_total |
Total capacity |
||
node_volume_capacity_bytes_used |
Used capacity |
||
node_volume_inodes_available |
Available inodes |
||
node_volume_inodes_total |
Total number of inodes |
||
node_volume_inodes_used |
Used inodes |
||
node_volume_read_transmissions_total |
Number of read transmission times |
||
node_volume_read_timeouts_total |
Number of read timeouts |
||
node_volume_read_sent_bytes_total |
Number of bytes read |
||
node_volume_read_queue_time_milliseconds_total |
Total read queue waiting time |
||
node_volume_read_rtt_time_milliseconds_total |
Total read RTT |
||
node_volume_write_transmissions_total |
Total number of write transmissions |
||
node_volume_write_timeouts_total |
Total number of write timeouts |
||
node_volume_write_queue_time_milliseconds_total |
Total write queue waiting time |
||
node_volume_write_rtt_time_milliseconds_total |
Total write RTT |
||
node_volume_localvolume_stats_capacity_bytes |
Total local volume capacity |
||
node_volume_localvolume_stats_available_bytes |
Available local volume capacity |
||
node_volume_localvolume_stats_used_bytes |
Used local volume capacity |
||
node_volume_localvolume_stats_inodes |
Number of inodes for a local volume |
||
node_volume_localvolume_stats_inodes_used |
Used inodes for a local volume |
||
podMonitor/monitoring/nginx-ingress-controller/0 |
monitoring/nginx-ingress-controller |
nginx_ingress_controller_connect_duration_seconds_bucket |
Duration for connecting to the upstream server |
nginx_ingress_controller_connect_duration_seconds_sum |
Duration for connecting to the upstream server |
||
nginx_ingress_controller_connect_duration_seconds_count |
Duration for connecting to the upstream server |
||
nginx_ingress_controller_request_duration_seconds_bucket |
Time required for processing a request, in milliseconds |
||
nginx_ingress_controller_request_duration_seconds_sum |
Time required for processing a request, in milliseconds |
||
nginx_ingress_controller_request_duration_seconds_count |
Time required for processing a request, in milliseconds |
||
nginx_ingress_controller_request_size_bucket |
Length of a request (including the request line, header, and body) |
||
nginx_ingress_controller_request_size_sum |
Length of a request (including the request line, header, and body) |
||
nginx_ingress_controller_request_size_count |
Length of a request (including the request line, header, and body) |
||
nginx_ingress_controller_response_duration_seconds_bucket |
Time required for receiving the response from the upstream server |
||
nginx_ingress_controller_response_duration_seconds_sum |
Time required for receiving the response from the upstream server |
||
nginx_ingress_controller_response_duration_seconds_count |
Time required for receiving the response from the upstream server |
||
nginx_ingress_controller_response_size_bucket |
Length of a response (including the request line, header, and request body) |
||
nginx_ingress_controller_response_size_sum |
Length of a response (including the request line, header, and request body) |
||
nginx_ingress_controller_response_size_count |
Length of a response (including the request line, header, and request body) |
||
nginx_ingress_controller_header_duration_seconds_bucket |
Time required for receiving the first header from the upstream server |
||
nginx_ingress_controller_header_duration_seconds_sum |
Time required for receiving the first header from the upstream server |
||
nginx_ingress_controller_header_duration_seconds_count |
Time required for receiving the first header from the upstream server |
||
nginx_ingress_controller_bytes_sent |
Number of bytes sent to the client |
||
nginx_ingress_controller_ingress_upstream_latency_seconds |
Upstream service latency |
||
nginx_ingress_controller_requests |
Total number of client requests |
||
nginx_ingress_controller_nginx_process_connections |
Number of client connections in the active, read, write, or wait state |
||
nginx_ingress_controller_nginx_process_connections_total |
Total number of client connections in the accepted or handled state |
||
nginx_ingress_controller_nginx_process_cpu_seconds_total |
Total CPU time consumed by the Nginx process (unit: second) |
||
nginx_ingress_controller_nginx_process_num_procs |
Number of processes |
||
nginx_ingress_controller_nginx_process_oldest_start_time_seconds |
Start time in seconds since January 1, 1970 |
||
nginx_ingress_controller_nginx_process_read_bytes_total |
Total number of bytes read |
||
nginx_ingress_controller_nginx_process_requests_total |
Total number of requests processed by Nginx since startup |
||
nginx_ingress_controller_nginx_process_resident_memory_bytes |
Resident memory set usage of a process, that is, the actual physical memory usage |
||
nginx_ingress_controller_nginx_process_virtual_memory_bytes |
Virtual memory usage of a process, that is, the total memory allocated to the process, including the actual physical memory and virtual swap space |
||
nginx_ingress_controller_nginx_process_write_bytes_total |
Total amount of data written by the process to disks or other devices for long-term storage |
||
nginx_ingress_controller_build_info |
A metric with a constant '1' labeled with information about the build |
||
nginx_ingress_controller_check_success |
Cumulative count of syntax check operations of the Nginx ingress controller |
||
nginx_ingress_controller_config_hash |
Configured hash value |
||
nginx_ingress_controller_config_last_reload_successful |
Whether the last configuration reload attempt was successful |
||
nginx_ingress_controller_config_last_reload_successful_timestamp_seconds |
Timestamp of the last successful configuration reload |
||
nginx_ingress_controller_ssl_certificate_info |
All information associated with a certificate |
||
nginx_ingress_controller_success |
Cumulative number of reload operations of the Nginx ingress controller |
||
nginx_ingress_controller_orphan_ingress |
Status of an orphaned ingress (1 indicates an orphaned ingress). 0: Not isolated. namespace: namespace of the ingress ingress: name of the ingress type: status of the ingress. The value can be no-service or no-endpoint. |
||
nginx_ingress_controller_admission_config_size |
Size of the admission controller configuration |
||
nginx_ingress_controller_admission_render_duration |
Rendering duration of the admission controller |
||
nginx_ingress_controller_admission_render_ingresses |
Length of ingresses rendered by the admission controller |
||
nginx_ingress_controller_admission_roundtrip_duration |
Time spent by the admission controller to process new events |
||
nginx_ingress_controller_admission_tested_duration |
Time spent on admission controller tests |
||
nginx_ingress_controller_admission_tested_ingresses |
Length of ingresses processed by the admission controller |
||
podMonitor/monitoring/cceaddon-npd/0 |
monitoring/cceaddon-npd |
problem_counter |
Number of times that the check item is found abnormal |
problem_gauge |
Whether the check item has triggered an exception
|
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot