Updated on 2024-06-17 GMT+08:00

Dashboard

With a dashboard, different graphs such as line graphs and digit graphs are displayed on the same screen, which lets you view comprehensive monitoring data.

Checking and Switching Views

  1. Select a fleet or a cluster that is not added to the fleet.

    Figure 1 Selecting a fleet or a cluster not in the fleet

  2. The view is displayed by default after the Dashboard tab is selected.
  3. Configure related parameters for checking views. Parameters available for setting vary with views. See Table 1 for details.
  4. Specify the view window.

    Select or customize time segments in the upper right corner of the page, and click to refresh the page.

  5. The CIA dashboard provides preset views. You can click the Switch View button next to the view name to select monitoring data to view. Table 1 describes the preset views.

    Table 1 Preset views

    View Name

    Parameter

    Monitoring Metric Included

    Cluster View (Default View)

    Cluster

    • Number of Nodes/Nodes with Unavailable Disks/Nodes Unavailable
    • CPU/Memory Usage
    • CPU/Memory Requests Commitment
    • CPU/Memory Limits Commitment
    • Num of Pods/Containers
    • CPU/Memory Usage
    • Network Receive/Transmit Rate
    • Average Network Receive/Transmit Rate
    • Rate of Received/Transmitted Packets
    • Packet Loss Rate (Receive/Transmit)
    • Disk IOPS (Read+Write)
    • Throughput (Read+Write)

    APIServer View

    • Cluster
    • Instance
    • Alived
    • QPS
    • Request Success Rate (Read)
    • Requests Being Processed
    • Request Rate (Read/Write)
    • Request Error Rate (Read/Write)
    • P99 Request Latency (Read/Write)
    • Work Queue Growth Rate/Work Queue Depth
    • Work Queue Latency (P99)
    • Memory/CPU Usage
    • Goroutines

    Pod View

    • Cluster
    • Namespace
    • Pod
    • Total Containers/Running Containers
    • Pod Status
    • Container Restarts
    • CPU/Memory Usage
    • CPU Throttling
    • Network Receive/Transmit Rate
    • Rate of Received/Transmitted Packets
    • Packet Loss Rate (Receive/Transmit)
    • Disk IOPS (Read+Write)
    • Throughput (Read+Write)
    • File System Usage/Used

    Host View

    • Cluster
    • Node
    • CPU/Memory Usage
    • Load Average
    • Memory Usage
    • Disk Written/Read
    • Disk Space Usage
    • Disk I/O

    k8s-node

    • Cluster
    • Node
    • CPU/Memory Usage
    • CPU/Memory Requests Commitment
    • CPU/Memory Limits Commitment
    • Memory Usage
    • Network Receive/Transmit Rate
    • Rate of Received/Transmitted Packets (Pod)
    • Rate of Received/Transmitted Packets
    • Packet Loss Rate (Receive/Transmit)
    • Disk IOPS (Read+Write)
    • Throughput (Read+Write)

    CoreDNS

    • Cluster
    • Instance
    • Request Rate (by qtype/zone/DO bit)
    • Request Packet (UDP/TCP)
    • Response Rate (by rcode)
    • Response Rate (duration)
    • Response Packet (UDP/TCP)
    • Cache (size)
    • Cache (hitrate)

    PVC View (CCE Clusters Only)

    • Cluster
    • Namespace
    • PV
    • PVC
    • PV/PVC Status
    • Used PVC/PVC Usage
    • Used PVC Inodes/PVC Inodes Usage
    • Hourly/Daily/Weekly PVC Usage
    • Volumes Full in Week Based on Daily Use Rate

    kubelet

    • Cluster
    • Instance
    • Running Kubelets/Pods/Containers
    • Actual Volumes/Expected Volumes/Configuration Errors
    • Operation Rate/Error Rate/Latency
    • Pod Startup Rate/Latency (P99)
    • Storage Operation Rate/Error Rate/Latency (P99)
    • Cgroup Manager Operation Rate/Latency (P99)
    • PLEG Relist Rate/Interval/Latency (P99)
    • RPC Rate
    • Request Latency (P99)
    • Memory/CPU Usage
    • Goroutines

    Prometheus

    • Cluster
    • Job
    • Instance
    • Target Sync Interval
    • Targets
    • Average Pull Interval
    • Pull Failures
    • Appended Samples
    • Series/Chunks in the Head
    • Query Rate/Query Duration

    Prometheus Remote Write

    • Cluster
    • Instance
    • url
    • Highest Timestamp In vs. Highest Timestamp Sent
    • Rate5m
    • Rate in vs. succeeded or dropped 5m
    • Current/Maximum/Minimum/Expected Shards
    • Shard Size
    • Pend Samples
    • Current Segment of TSDB/Remote Write
    • Sample Discard/Failure/Retry Rate
    • Retry Rate of Enqueuing

    Workload

    • Cluster
    • Namespace
    • Type
    • Workload
    • CPU/Memory Usage
    • Network Receive/Transmit Rate
    • Average Network Receive/Transmit Rate
    • Rate of Received/Transmitted Packets
    • Packet Loss Rate (Receive/Transmit)

    xGPU View

    Cluster

    • Cluster - xGPU Device GPU Memory Usage
    • Cluster - xGPU Device GPU Compute Usage
    • Node - xGPU Device GPU Memory Usage
    • Node - xGPU Device Compute Usage
    • Node - Number of xGPU Devices
    • Node - Allocated GPU Memory of xGPU Devices
    • GPU - xGPU Device GPU Memory Usage
    • GPU - Allocated GPU Memory of xGPU Devices
    • GPU - GPU Memory Allocation Rate of xGPU Devices
    • GPU - xGPU Device Compute Usage
    • GPU - Number of xGPU Devices
    • GPU - Scheduling Policy
    • GPU - Number of Unhealthy xGPU Devices
    • Allocated Container GPU Memory
    • Container GPU Compute Usage
    • Used Container GPU Memory
    • Container GPU Memory Usage