Updated on 2024-12-18 GMT+08:00

Dashboard

With a dashboard, different graphs such as line graphs and digit graphs are displayed on the same screen, which lets you view comprehensive monitoring data.

Checking and Switching Views

  1. Select a fleet or a cluster that is not added to the fleet.

    Figure 1 Selecting a fleet or a cluster not in the fleet

  2. The view is displayed by default after the Dashboard tab is selected.
  3. Configure related parameters for checking views. Parameters available for setting vary with views. See Table 1 for details.
  4. Specify the view window.

    Select or customize time segments in the upper right corner of the page, and click to refresh the page.

  5. The CIA dashboard provides preset views. You can click the Switch View button next to the view name to select monitoring data to view. Table 1 describes the preset views.

    Table 1 Preset views

    View Name

    Parameter

    Metric

    Cluster View (Default View)

    Cluster

    • Nodes/Nodes with Unavailable Disks/Nodes Unavailable
    • CPU/Memory Usage
    • CPU/Memory Requests Commitment
    • CPU/Memory Limits Commitment
    • Pods/Containers
    • Used CPU/Memory
    • Network Receive/Transmit Rate
    • Average Network Receive/Transmit Rate
    • Rate of Received/Transmitted Packets
    • Packet Loss Rate (Receive/Transmit)
    • Disk IOPS (Read+Write)
    • Throughput (Read+Write)

    API Server View

    • Cluster
    • Instance
    • Alived
    • QPS
    • Request Success Rate (Read)
    • Requests Being Processed
    • Request Rate (Read/Write)
    • Request Error Rate (Read/Write)
    • P99 Request Latency (Read/Write)
    • Work Queue Growth Rate/Work Queue Depth
    • Work Queue Latency (P99)
    • Used Memory/CPU
    • Goroutines

    Pod View

    • Cluster
    • Namespace
    • Pod
    • Total Containers/Running Containers
    • Pod Status
    • Container Restarts
    • Used CPU/Memory
    • CPU Throttling
    • Network Receive/Transmit Rate
    • Rate of Received/Transmitted Packets
    • Packet Loss Rate (Receive/Transmit)
    • Disk IOPS (Read+Write)
    • Throughput (Read+Write)
    • File System Usage/Used

    Host View

    • Cluster
    • Node
    • CPU/Memory Usage
    • Load Average
    • Used Memory
    • Disk Written/Read
    • Disk Space Usage
    • Disk I/O

    Node View

    • Cluster
    • Node
    • CPU/Memory Usage
    • CPU/Memory Requests Commitment
    • CPU/Memory Limits Commitment
    • Memory Usage
    • Network Receive/Transmit Rate
    • Rate of Received/Transmitted Packets (Pod)
    • Rate of Received/Transmitted Packets
    • Packet Loss Rate (Receive/Transmit)
    • Disk IOPS (Read+Write)
    • Throughput (Read+Write)

    CoreDNS View

    • Cluster
    • Instance
    • Request Rate (Type/Zone/DO Bit)
    • Request Packet (UDP/TCP)
    • Response Rate (Status Code)
    • Response Latency
    • Response Packet (UDP/TCP)
    • Cache Records
    • Cache Hit Ratio

    PVC View (CCE Clusters Only)

    • Cluster
    • Namespace
    • PV
    • PVC
    • PV/PVC Status
    • Used PVC/PVC Usage
    • Used Inodes in PVC/Inodes Usage in PVC
    • Hourly/Daily/Weekly PVC Usage
    • Used PVC in the Next Week

    Kubelet View

    • Cluster
    • Instance
    • Running Kubelets/Pods/Containers
    • Actual Volumes/Expected Volumes/Configuration Errors
    • Operation Rate/Error Rate/Latency
    • Pod Startup Rate/Latency (P99)
    • Storage Operation Rate/Error Rate/Latency (P99)
    • Cgroup Manager Operation Rate/Latency (P99)
    • PLEG Relist Rate/Interval/Latency (P99)
    • RPC Rate
    • Request Latency (P99)
    • Used Memory/CPU
    • Goroutines

    Prometheus

    • Cluster
    • Job
    • Instance
    • Target Sync Interval
    • Targets
    • Average Pull Interval
    • Pull Failures
    • Appended Samples
    • Series/Chunks in the Head
    • Query Rate/Query Duration

    Prometheus Remote Write

    • Cluster
    • Instance
    • url
    • Highest Timestamp In vs. Highest Timestamp Sent
    • Rate5m
    • Rate in vs. succeeded or dropped 5m
    • Current/Maximum/Minimum/Expected Shards
    • Shard Size
    • Pend Samples
    • Current Segment of TSDB/Remote Write
    • Sample Discard/Failure/Retry Rate
    • Retry Rate of Enqueuing

    GPU View

    Cluster

    • Cluster - GPU Memory Usage
    • Cluster - GPU Compute Usage
    • Node - Used GPU Memory
    • Node - GPU Memory Usage
    • Node - GPU Compute Usage
    • GPU - Used GPU Memory
    • GPU - GPU Compute Usage
    • GPU - Temperature
    • GPU - Memory Clock
    • GPU - PCIe Bandwidth

    xGPU View

    Cluster

    • Cluster - xGPU Device GPU Memory Usage
    • Cluster - xGPU Device GPU Compute Usage
    • Node - xGPU Device GPU Memory Usage
    • Node - xGPU Device Compute Usage
    • Node - Number of xGPU Devices
    • Node - Allocated GPU Memory of xGPU Devices
    • GPU - xGPU Device GPU Memory Usage
    • GPU - Allocated GPU Memory of xGPU Devices
    • GPU - GPU Memory Allocation Rate of xGPU Devices
    • GPU - xGPU Device Compute Usage
    • GPU - Number of xGPU Devices
    • GPU - Scheduling Policy
    • GPU - Number of Unhealthy xGPU Devices
    • Allocated Container GPU Memory
    • Container GPU Compute Usage
    • Used Container GPU Memory
    • Container GPU Memory Usage