Updated on 2025-05-22 GMT+08:00

Common Faults

Excessive CPU, Memory, or Disk Usage or Disk IOPS on a BMS

  • Check: Use Cloud Eye to check the CPU, memory, or disk usage or disk IOPS.
  • Recovery:
    1. Use a new BMS with higher specifications or add BMSs for balancing loads based on service requirements.
    2. Enable overload protection for the application layer to keep high-priority services running smoothly.

Failed to Connect to Backend BMSs

  • Check: Network connection failed.
  • Recovery:
    1. Deploy at least two backend BMSs. For stateless services, configure ELB to ensure service reliability. For stateful services, enable multi-instance HA at the application layer.
    2. If this is a temporary failure, for example, network overload, retry at the application layer. For details, see RES09 Retries After Failures.
    3. If the connection to an overloaded BMS failed, rectify the fault by referring to "Excessive CPU, Memory, or Disk Usage or Disk IOPS on a BMS."

Unavailable or Abnormal BMSs

  • Check: Configure a scheduled health check for backend servers of load balancers to check whether their key functions are normal.
  • Recovery: Configure multiple BMSs for each application layer and use a load balancer to perform a health check on the BMSs. If a BMS is unavailable, the load balancer will not send service requests to the BMS any more.

BMSs, Disks Attached on Them, or Their Data Are Deleted Unexpectedly

  • Check: N/A
  • Recovery: For stateless services, use a template to quickly provision new BMSs. For stateful services, use CBR to periodically back up EVS disks attached to BMSs, and use the backups to quickly restore their data.

A Physical Server or Local Disk Used by a BMS Is Faulty

  • Check: Check the status of physical servers and local disks at the application layer.
  • Recovery: At the application layer, HA is enabled for disks of a single BMS using RAID, and data replication and HA across BMSs are also enabled. In this way, the BMS service can be quickly recovered in case that a physical server or local disk is faulty. You are advised to use EVS disks instead of local disks as they are more reliable.