Notice of NVIDIA GPU Driver Vulnerability (CVE-2021-1056)
Description
NVIDIA detected a vulnerability (assigned CVE-2021-1056), which exists in the NVIDIA GPU drivers and is related to device isolation. When a container is started in the non-privileged mode, an attacker can exploit this vulnerability to create a special character device file in the container to obtain the access permission of all GPU devices on the host machine.
For more information about this vulnerability, see CVE-2021-1056.
According to the official NVIDIA announcement, if your CCE cluster has a GPU-enabled node (ECS) and uses the recommended NVIDIA GPU driver (Tesla 396.37), your NVIDIA driver is not affected by this vulnerability. If you have installed or updated the NVIDIA GPU driver on your node, this vulnerability may be involved.
Type |
CVE-ID |
Severity |
Discovered |
---|---|---|---|
Privilege escalation |
Medium |
2021-01-07 |
Impact
According to the vulnerability notice provided by NVIDIA, the affected NVIDIA GPU driver versions are as follows:
For more information, see the official NVIDIA website.
Note:
- The NVIDIA GPU driver version recommended for CCE clusters and the gpu-beta add-on has not yet been listed in the affected versions disclosed on the NVIDIA official website. If there are official updates, you will be notified and provided possible solutions to fix this vulnerability.
- If you have selected a custom NVIDIA GPU driver version or updated the GPU driver on the node, check whether your GPU driver is affected by this vulnerability by referring to the preceding table.
Querying the NVIDIA Driver Version of a GPU Node
Log in to your GPU node and run the following command to view the driver version.
[root@XXX36 bin]# ./nvidia-smi Fri Apr 16 10:28:28 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:21:01.0 Off | 0 | | N/A 68C P0 31W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
The preceding command output indicates that the GPU driver version of the node is 460.32.03.
Solution
Upgrade the node to the target driver version based on the Impact.
After upgrading your NVIDIA GPU driver, you need to restart the GPU node, which will temporarily affect your services.
- If your node driver version belongs to 418 series, upgrade it to 418.181.07.
- If your node driver version belongs to 450 series, upgrade it to 450.102.04.
- If your node driver version belongs to 460 series, upgrade it to 460.32.03.
If you upgrade the GPU driver of a CCE cluster node, upgrade or reinstall the gpu-beta add-on, and enter the download address of the repaired NVIDIA GPU driver when installing the add-on.
Helpful Links
- NVIDIA security bulletin: https://nvidia.custhelp.com/app/answers/detail/a_id/5142
- Ubuntu security notice: https://ubuntu.com/security/CVE-2021-1056
- CVE: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-1056
- NVD: https://nvd.nist.gov/vuln/detail/CVE-2021-1056
- CVE PoC: https://github.com/pokerfaceSad/CVE-2021-1056
- GPUMounter: https://github.com/pokerfaceSad/GPUMounter
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.