Updated on 2025-05-22 GMT+08:00

Common Faults

Excessive CPU, Memory, or Disk Usage or Disk IOPS on an ECS

  • Check: Use Cloud Eye to check the CPU, memory, or disk usage or disk IOPS.
  • Recovery:
    1. Scale up resources or add ECSs for balancing loads based on service requirements.
    2. For stateless services, enable AS to automatically scale out resources.
    3. Enable overload protection for the application layer to keep high-priority services running smoothly.

Failed to Connect to Backend ECSs

  • Check: Network connection failed.
  • Recovery:
    1. Deploy at least two backend ECSs. For stateless services, configure ELB to ensure service reliability. For stateful services, enable multi-instance HA at the application layer.
    2. If this is a temporary failure, for example, if backend ECSs are being recovered, connect to the ECSs again at the application layer. For details, see RES09 Retries After Failures.
    3. If the connection to an overloaded ECS failed, rectify the fault by referring to "Excessive CPU, Memory, or Disk Usage or Disk IOPS on an ECS."

Unavailable or Abnormal ECSs

  • Check: Configure a scheduled health check for backend servers of load balancers to check whether their key functions are normal.
  • Recovery: Configure multiple ECSs for each application layer and use a load balancer to perform a health check on the ECSs. If an ECS is unavailable, the load balancer will not send service requests to the ECS any more.

ECSs, Disks Attached on Them, or Their Data Are Deleted Unexpectedly

  • Check: N/A
  • Recovery: For stateless services, use a template to quickly provision new ECSs. For stateful services, use CBR to periodically back up ECSs, and use the backups to quickly restore their data.

A Local Disk Used by an ECS Is Faulty

  • Check: Check the status of local disks at the application layer.
  • Recovery: At the application layer, HA is enabled for disks of a single ECS using RAID, and data replication and HA across ECSs are also enabled. In this way, the ECS service can be recovered quickly when a local disk is faulty. You are advised to use EVS disks instead of local disks as they are more reliable.