Installing the NVIDIA GPU Driver and CUDA Toolkit on a P3 BMS

Scenarios

After a GPU-accelerated P3 BMS (using the physical.p3.large flavor) is created, the NVIDIA GPU driver and CUDA Toolkit must be installed on it for computing acceleration.

Prerequisites

An EIP has been bound to the BMS.

You have obtained the required driver installation packages.

**Table 1** Download paths for the NVIDIA GPU driver and CUDA Toolkit
OS	Driver	How to Obtain
Ubuntu 16.04 and CentOS 7.4	NVIDIA GPU driver installation package: NVIDIA-Linux-x86_64-384.81.run	http://www.nvidia.com/download/driverResults.aspx/124722/en-us
Ubuntu 16.04 and CentOS 7.4	CUDA Toolkit installation package: cuda_9.0.176_384.81_linux.run	https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=7&target_type=runfilelocal

The procedure of installing the NVIDIA GPU driver and CUDA Toolkit varies depending on the OS.

CentOS 7.4

Log in to the target BMS and run the following command to switch to user root:

su root
(Optional) If the gcc, gcc-c++, make, and kernel-devel dependency packages do not exist, run the following commands to install the gcc, gcc-c++, make, and kernel-devel tools:

yum install gcc

yum install gcc-c++

yum install make

yum install kernel-devel-`uname -r`
(Optional) Add the Nouveau driver to the blacklist.

If the Nouveau driver has been installed and loaded, perform the following operations to add the Nouveau driver to the blacklist to avoid conflicts:
1. Add blacklist nouveau to the end of the /etc/modprobe.d/blacklist.conf file.
2. Run the following commands to back up and reconstruct initramfs:
  mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
  
  dracut -v /boot/initramfs-$(uname -r).img $(uname -r)
3. Run the reboot command to restart the BMS.
(Optional) If the X service is running, run the systemctl set-default multi-user.target command and restart the BMS to enter multi-user mode.
(Optional) Install the NVIDIA GPU driver.

If you selected a specified version of NVIDIA GPU driver rather than a version contained in the CUDA Toolkit, perform this step.
1. Download NVIDIA GPU driver installation package NVIDIA-Linux-x86_64-xxx.yy.run from https://www.nvidia.com/Download/index.aspx?lang=en, and upload this package to the /tmp directory on the BMS.
  Figure 1 Searching for the NVIDIA GPU driver package (CentOS 7.4)
2. Run the following command to install the NVIDIA GPU driver:
  sh ./NVIDIA-Linux-x86_64-xxx.yy.run
3. Run the following command to delete the installation package:
  rm -f NVIDIA-Linux-x86_64-xxx.yy.run
Install the CUDA Toolkit.
1. Download CUDA Toolkit installation package cuda_a.b.cc_xxx.yy_linux.run from https://developer.nvidia.com/cuda-downloads, and upload this package to the /tmp directory on the BMS.
2. Run the following command to change the permission to the installation package:
  chmod +x cuda_a.b.cc_xxx.yy_linux.run
3. Run the following command to install the CUDA Toolkit:
  ./cuda_a.b.cc_xxx.yy_linux.run --toolkit --samples --silent --override --tmpdir=/tmp/
4. Run the following command to delete the installation package:
  rm -f cuda_a.b.cc_xxx.yy_linux.run
5. Run the following commands to check whether the installation is successful:
  cd /usr/local/cuda/samples/1_Utilities/deviceQueryDrv/
  
  make
  
  ./deviceQueryDrv
  
  If the command output contains "Result = PASS", the CUDA Toolkit and the NVIDIA GPU driver have been installed successfully.

Ubuntu 16.04

Log in to the target BMS and run the following command to switch to user root:

sudo root
(Optional) If the gcc, g++, and make dependency packages do not exist, run the following commands to install the gcc, g++, and make tools:

apt-get install gcc

apt-get install g++

apt-get install make
(Optional) Add the Nouveau driver to the blacklist.

If the Nouveau driver has been installed and loaded, perform the following operations to add the Nouveau driver to the blacklist to avoid conflicts:
1. Add the following information to the end of the /etc/modprobe.d/blacklist.conf file:
```
blacklist nouveau
options nouveau modeset=0
```
2. Run the following commands to back up and reconstruct initramfs:
  mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
  
  sudo update-initramfs -u
3. Run the sudo reboot command to restart the BMS.
(Optional) If the X service is running, run the systemctl set-default multi-user.target command and restart the BMS to enter multi-user mode.
(Optional) Install the NVIDIA GPU driver.

If you selected a specified version of NVIDIA GPU driver rather than a version contained in the CUDA Toolkit, perform this step.
1. Download NVIDIA GPU driver installation package NVIDIA-Linux-x86_64-xxx.yy.run. from https://www.nvidia.com/Download/index.aspx?lang=en, and upload this package to the /tmp directory on the BMS.
  Figure 2 Searching the NVIDIA GPU driver package
2. Run the following command to install the NVIDIA GPU driver:
  sh ./NVIDIA-Linux-x86_64-xxx.yy.run
3. Run the following command to delete the installation package:
  rm -f NVIDIA-Linux-x86_64-xxx.yy.run
Install the CUDA Toolkit.
1. Download CUDA Toolkit installation package cuda_a.b.cc_xxx.yy_linux.run from https://developer.nvidia.com/cuda-downloads, and upload this package to the /tmp directory on the BMS.
2. Run the following command to change the permission to the installation package:
  chmod +x cuda_a.b.cc_xxx.yy_linux.run
3. Run the following command to install the CUDA Toolkit:
  ./cuda_a.b.cc_xxx.yy_linux.run --toolkit --samples --silent --override --tmpdir=/tmp/
4. Run the following command to delete the installation package:
  rm -f cuda_a.b.cc_xxx.yy_linux.run
5. Run the following commands to check whether the installation is successful:
  cd /usr/local/cuda/samples/1_Utilities/deviceQueryDrv/
  
  make
  
  ./deviceQueryDrv
  
  If the command output contains "Result = PASS", the CUDA Toolkit and the NVIDIA GPU driver have been installed successfully.
6. Run the following command to check whether the driver is running properly:
  nvidia-smi topo -m
  
  If GPU information is displayed in the command output, the driver is running properly.