Updated on 2022-08-11 GMT+08:00

Installing and Configuring a GPU Driver

Context

For an edge node that uses GPUs, you need to install and configure the GPU driver before managing the edge node on IEF.

Currently, IEF supports NVIDIA Tesla GPUs such as P4, P40 and T4, and the GPU drivers that match CUDA Toolkit 8.0 to 10.0.

Procedure

  1. Install the GPU driver.

    1. Download the GPU driver. The recommended driver link is as follows:

      https://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/tesla/440.33.01/NVIDIA-Linux-x86_64-440.33.01.run&lang=us&type=Tesla

    2. Run the following command to install the GPU driver:

      bash NVIDIA-Linux-x86_64-440.33.01.run

    3. Run the following command to check the GPU driver installation status:

      nvidia-smi

  2. Log in to the edge node as user root.
  3. Run the following command:

    nvidia-modprobe -c0 -u

  4. Create directories.

    mkdir -p /var/IEF/nvidia/drivers /var/IEF/nvidia/bin /var/IEF/nvidia/lib64

  5. Copy GPU driver files to the directories.

    • For CentOS, run the following commands in sequence to copy the driver files:

      cp /lib/modules/{Kernel version of the current environment}/kernel/drivers/video/nvi* /var/IEF/nvidia/drivers/

      cp /usr/bin/nvidia-* /var/IEF/nvidia/bin/

      cp -rd /usr/lib64/libcuda* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib64/libEG* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib64/libGL* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib64/libnv* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib64/libOpen* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib64/libvdpau_nvidia* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib64/vdpau /var/IEF/nvidia/lib64/

    • For Ubuntu, run the following commands in sequence to copy the driver files:

      cp /lib/modules/{Kernel version of the current environment}/kernel/drivers/video/nvi* /var/IEF/nvidia/drivers/

      cp /usr/bin/nvidia-* /var/IEF/nvidia/bin/

      cp -rd /usr/lib/x86_64-linux-gnu/libcuda* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib/x86_64-linux-gnu/libEG* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib/x86_64-linux-gnu/libGL* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib/x86_64-linux-gnu/libnv* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib/x86_64-linux-gnu/libOpen* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib/x86_64-linux-gnu/libvdpau_nvidia* /var/IEF/nvidia/lib64/

      cp -rd /usr/lib/x86_64-linux-gnu/vdpau /var/IEF/nvidia/lib64/

    You can run the uname -r command to view the kernel version of the current environment. The following is an example. Replace the kernel version with the actual value.

    # uname -r
    3.10.0-514.e17.x86_64

  6. Run the following command to change the directory permissions:

    chmod -R 755 /var/IEF