Updated on 2025-05-22 GMT+08:00

Common Faults

Excessive CPU, Memory, or Disk Usage of a CCE Cluster

  • Check: Use AOM to check the CPU, memory, and disk usage of a CCE cluster.
  • Recovery:
    1. Change the cluster specifications or add resources based on service requirements.

Excessive CPU, Memory, Disk, GPU, or GPU Cache Usage or Disk IOPS of a CCE Node

  • Check: Use AOM to check the CPU, memory, disk, GPU, or GPU cache usage or disk IOPS of a CCE node.
  • Recovery:
    1. Change the node specifications or add nodes based on service requirements.

Excessive CPU, Memory, GPU, or GPU Cache Usage of a CCE Workload

  • Check: Use AOM to check the CPU, memory, GPU, or GPU cache usage of a CCE workload.
  • Recovery:
    1. Adjust the resource quotas of the workload or add workloads based on service requirements.