Help Center/ Elastic Cloud Server/ Troubleshooting/ Self-diagnosis of Faulty GPU-accelerated ECSs/ What Types of GPU-accelerated ECS Faults Are There?
Updated on 2025-07-30 GMT+08:00

What Types of GPU-accelerated ECS Faults Are There?

Table 1 lists the types of GPU-accelerated ECS faults.

Table 1 Types of GPU-accelerated ECS faults

Fault Rectifiable

Fault Type

Reference

Yes. Faults can be rectified by referring to related guidance documents.

Incorrect image configuration

What Do I Do If the Nouveau Driver Is Not Disabled?

ECC errors

What Do I Do If There Are Retired Pages?

Kernel upgrade issues

What Do I Do If the Driver Is Unavailable After the Kernel Is Upgraded?

Disconnected GPUs

Why Is the Number of Queried Graphics Cards Different from the Actual One?

Graphics card ERR!

What Do I Do If ERR! Is Displayed?

Software installation problems

What Do I Do If an Error Occurs During the Installation of the NVIDIA Driver and CUDA Software?

Driver compatibility issues

How Do I Handle Driver Compatibility Issues?

Xid errors

How Do I Handle Recoverable Xid Errors?

Graphics card being disabled

What Do I Do If the Error Message "Windows has stopped this device because it has reported problems" Is Displayed?

Image errors

What Do I Do If the Driver and Image I Have Selected Do Not Match My Service Scenario?

License issues

What Do I Do If I Have Installed the GRID Driver but Have Not Purchased or Configured the License?

No. If the fault cannot be rectified, contact technical support.

InfoROM errors

How Do I Handle the infoROM Error?

ECC errors

What Do I Do If ECC Error "double bit ecc error" Occurs and There Are No Retired Pages Shown in the nvidia-smi -q Command Output?

What Do I Do If nvidia-smi Command Output Shows the SRAM ECC Error (V100 GPUs)?

Disconnected GPUs

What Do I Do If the GPU Is Disconnected or the Graphics Card Can't Be found, or rev ff Is Displayed After lspci | grep -i nvidia Is Executed?

High temperature

What Do I Do If the nvidia-smi Command Output Shows Overheated GPUs?

Driver installation errors

What Do I Do If "Unable to load the kernel module 'nvidia.ko'" Is Displayed During Driver Installation?

Xid errors

What Can I Do If an Xid Error Is Displayed in the Message Log When a GPU-accelerated ECS Is Faulty?