Help Center/ Elastic Cloud Server/ FAQs/ OS FAQ/ Why Does the NVIDIA Kernel Crashes on a GPU-accelerated ECS?
Updated on 2024-09-29 GMT+08:00

Why Does the NVIDIA Kernel Crashes on a GPU-accelerated ECS?

Symptom

A GPU-accelerated ECS crashed during running. After the ECS was restarted, no NVIDIA driver stack logs were recorded.

Figure 1 Stack log information

Possible Causes

The ECS kernel crashed due to an official NVIDIA driver bug.

Solutions

  • Method 1: Restart the ECS.

    After the ECS is restarted, the ECS can run properly.

  • Method 2: Update the driver version.
    If the problem persists after the ECS is restarted, download the latest CUDA driver from the NVIDIA official website.
    1. Log in to the official NVIDIA driver download page at https://www.nvidia.cn/Download/index.aspx?lang=en.
      Figure 2 Driver download page
    2. Enter the product information and click Search.
      Figure 3 Latest driver version download page

      On the Release Highlights tab, you can learn about the version updates and resolved issues of this version and determine whether to upgrade accordingly.