Help Center/ Huawei Cloud EulerOS/ User Guide/ xGPU/ Installing and Using xGPU
Updated on 2024-08-28 GMT+08:00

Installing and Using xGPU

This section describes how to install and use xGPU.

Constraints

  • xGPU is supported only on NVIDIA Tesla T4 and V100.
  • The HCE OS kernel version is 5.10 or later.
  • NVIDIA driver 535.54.03 has been installed on GPU-accelerated ECSs.
  • Docker 18.09.0-300 or later has been installed on GPU-accelerated ECSs.
  • Limited by GPU virtualization, when applications in a container are initialized, the GPU compute monitored by NVIDIA System Management Interface (nvidia-smi) may exceed the upper limit of the GPU compute available for the container.
  • When the first CUDA application is created, a percentage of the GPU memory (about 3 MiB on NVIDIA Tesla T4) is requested from each GPU. This GPU memory is considered as the management overhead and is not controlled by the xGPU service.
  • xGPU cannot be used on bare metal servers and VMs configured with a passthrough NIC at the same time.
  • GPU memory isolation provided by xGPU cannot be used for GPU memory applications through unified virtual memory (UVM) by calling CUDA API cudaMallocManaged(). For more information, see NVIDIA official document. Use other methods to request GPU memory. For example, call cudaMalloc().

    xGPU allows users to apply for GPU memory by dynamically disabling the UVM. For details about how to disable the UVM, see the description of the uvm_disable interface.

Installing xGPU

To install xGPU, contact customer service.

You are advised to use xGPU through CCE. For details, see GPU Virtualization.

Using xGPU

The following table describes the environment variables of xGPU. When creating a container, you can specify the environment variables to configure the GPU compute and memory that a container engine can obtain using xGPU.

Table 1 Environment variables that affect xGPU

Environment Variable

Value Type

Description

Example

GPU_IDX

Integer

Specifies the GPU that can be used by a container.

Assigning the first GPU to a container:

GPU_IDX=0

GPU_CONTAINER_MEM

Integer

Specifies the size of the GPU memory that will be used by a container, in MiB.

Allocating 5,120 MiB of the GPU memory to a container:

GPU_CONTAINER_MEM=5120

GPU_CONTAINER_QUOTA_PERCENT

Integer

Specifies the percentage of the GPU compute that will be allocated to a container from a GPU.

The GPU compute is allocated at a granularity of 1%. It is recommended that the minimum GPU compute be greater than or equal to 4%.

Allocating 50% of the GPU compute to a container:

GPU_CONTAINER_QUOTA_PERCEN=50

GPU_POLICY

Integer

Specifies the GPU compute isolation policy used by the GPU.

  • 0: native scheduling. The GPU compute is not isolated.
  • 1: fixed scheduling
  • 2: average scheduling
  • 3: preemptive scheduling
  • 4: weight-based preemptive scheduling
  • 5: hybrid scheduling
  • 6: weighted scheduling

For details, see GPU Compute Scheduling Examples.

Setting the GPU compute isolation policy to fixed scheduling:

GPU_POLICY=1

GPU_CONTAINER_PRIORITY

Integer

Specifies the priority of a container.

  • 0: low priority
  • 1: high priority

Creating a high-priority container:

GPU_CONTAINER_PRIORITY=1

The following example describes how you can use xGPU to allow two NVIDIA-Docker containers to share one GPU.

Table 2 Data planning

Parameter

Container 1

Container 2

Description

GPU_IDX

0

0

Two containers use the first GPU.

GPU_CONTAINER_QUOTA_PERCENT

50

30

Allocate 50% of GPU compute to container 1 and 30% of GPU compute to container 2.

GPU_CONTAINER_MEM

5120

1024

Allocate 5,120 MiB of the GPU memory to container 1 and 1,024 MiB of the GPU memory to container 2.

GPU_POLICY

1

1

Set the first GPU to use fixed scheduling.

GPU_CONTAINER_PRIORITY

1

0

Give container 1 a high priority and container 2 a low priority.

Example:

docker run --rm -it --runtime=nvidia -e GPU_CONTAINER_QUOTA_PERCENT=50 -e GPU_CONTAINER_MEM=5120 -e GPU_IDX=0 -e GPU_POLICY=1 -e GPU_CONTAINER_PRIORITY=1 --shm-size 16g -v /mnt/:/mnt nvcr.io/nvidia/tensorrt:19.07-py3 bash 
docker run --rm -it --runtime=nvidia -e GPU_CONTAINER_QUOTA_PERCENT=30 -e GPU_CONTAINER_MEM=1024 -e GPU_IDX=0 -e GPU_POLICY=1 -e GPU_CONTAINER_PRIORITY=0 --shm-size 16g -v /mnt/:/mnt nvcr.io/nvidia/tensorrt:19.07-py3 bash

Viewing procfs Directory

At runtime, xGPU generates and manages multiple proc file systems (procfs) in the /proc/xgpu directory. The following are operations for you to view xGPU information and configure xGPU settings.

  1. Run the following command to view the node information:
    ls /proc/xgpu/
    0 container version uvm_disable

    The following table describes the information about the procfs nodes.

    Directory

    Read/Write Type

    Description

    0

    Read and write

    xGPU generates a directory for each GPU on a GPU-accelerated ECS. Each directory uses a number as its name, for example, 0, 1, and 2. In this example, there is only one GPU, and the directory ID is 0.

    container

    Read and write

    xGPU generates a directory for each container running in the GPU-accelerated ECSs.

    version

    Read-only

    xGPU version.

    uvm_disable

    Read and write

    Controls whether to disable the option that allows all containers to request GPU memory through UVM. The default value is 0.

    • 0: This option is enabled.
    • 1: This option is disabled.
  2. Run the following command to view the items in the directory of the first GPU:
    ls /proc/xgpu/0/
    max_inst meminfo policy quota utilization_line utilization_rate xgpu1 xgpu2

    The following table describes the information about the GPU.

    Directory

    Read/Write Type

    Description

    max_inst

    Read and write

    Specifies the maximum number of containers. The value ranges from 1 to 25. This parameter can be modified only when no container is running.

    meminfo

    Read-only

    Specifies the total available GPU memory.

    policy

    Read and write

    Specifies the GPU compute isolation policy used by the GPU. The default value is 1.

    • 0: native scheduling. The GPU compute is not isolated.
    • 1: fixed scheduling
    • 2: average scheduling
    • 3: preemptive scheduling
    • 4: weight-based preemptive scheduling
    • 5: hybrid scheduling
    • 6: weighted scheduling

    For details, see GPU Compute Scheduling Examples.

    quota

    Read-only

    Specifies the total weight of the GPU compute.

    utilization_line

    Read and write

    Specifies the GPU usage threshold for online containers to suppress offline containers.

    If the GPU usage exceeds the value of this parameter, online containers completely suppress offline containers. If the GPU usage does not exceed the value of this parameter, online containers partially suppress offline containers.

    utilization_rate

    Read only

    Specifies the GPU usage.

    xgpuIndex

    Read and write

    Specifies the xgpu subdirectory of the GPU.

    In this example, xgpu1 and xgpu2 belong to the first GPU.

  3. Run the following command to view the directory of a container:
    ls /proc/xgpu/container/
    9418 9582

    The following table describes the content in the directory.

    Directory

    Read/Write Type

    Description

    containerID

    Read and write

    Specifies the container ID.

    xGPU generates an ID for each container during container creation.

  4. Run the following commands to view the directory of each container:
    ls /proc/xgpu/container/9418/
    xgpu1 uvm_disable
    ls /proc/xgpu/container/9582/
    xgpu2 uvm_disable

    The following table describes the content in the directory.

    Directory

    Read/Write Type

    Description

    xgpuIndex

    Read and write

    Specifies the xgpu subdirectory of the container.

    In this example, xgpu1 is the subdirectory of container 9418, and xgpu2 is the subdirectory of container 9582.

    uvm_disable

    Read and write

    Controls whether to disable the option that only allows a container to request GPU memory through UVM. The default value is 0.

    • 0: This option is enabled.
    • 1: This option is disabled.
  5. Run the following command to view the xgpuIndex directory:
    ls /proc/xgpu/container/9418/xgpu1/
    meminfo priority quota

    The following table describes the content in the directory.

    Directory

    Read/Write Type

    Description

    meminfo

    Read-only

    Specifies the GPU memory allocated using xGPU and the remaining GPU memory.

    For example, 3308MiB/5120MiB, 64% free indicates that 5,120 MiB of memory is allocated and 64% is available.

    priority

    Read and write

    Specifies the priority of a container. The default value is 0.

    • 0: low priority
    • 1: high priority

    This parameter is used in hybrid scenarios where there are both online and offline containers. High-priority containers can preempt the GPU compute of low-priority containers.

    quota

    Read-only

    Specifies the percentage of the GPU compute allocated using xGPU.

    For example, 50 indicates that 50% of the GPU compute is allocated using xGPU.

You can run the following commands to perform operations on GPU-accelerated ECSs. For example, you can change the scheduling policy and modify the weight.

Command

Description

echo 1 > /proc/xgpu/0/policy

Changes the scheduling policy to weight-based preemptive scheduling for the first GPU.

cat /proc/xgpu/container/$containerID/$xgpuIndex/meminfo

Queries the memory allocated to a container.

cat /proc/xgpu/container/$containerID/$xgpuIndex/quota

Queries the GPU compute weight specified for a container.

cat /proc/xgpu/0/quota

Queries the weight of the remaining GPU compute on the first GPU.

echo 20 > /proc/xgpu/0/max_inst

Sets the maximum number of containers allowed on the first GPU to 20.

echo 1 > /proc/xgpu/container/$containerID/$xgpuIndex/priority

Sets the xGPU in a container to high priority.

echo 40 > /proc/xgpu/0/utilization_line

Sets the suppression threshold of the first GPU to 40%.

echo 1 > /proc/xgpu/container/$containerID/uvm_disable

Controls whether to disable the option that allows containers to use xGPU to request GPU memory through UVM.

Upgrading xGPU

You can perform cold upgrades on xGPU.

  1. Run the following command to stop all running containers:

    docker ps -q | xargs -I {} docker stop {}

  2. Run the following command to upgrade the RPM package of xGPU:

    rpm -U hce_xgpu

Uninstalling xGPU

  1. Run the following command to stop all running containers:

    docker ps -q | xargs -I {} docker stop {}

  2. Run the following command to uninstall the RPM package:

    rpm -e hce_xgpu

Monitoring Containers Using xgpu-smi

You can use xgpu-smi to view xGPU information, including the container ID, GPU compute usage and allocation, and GPU memory usage and allocation.

The following shows the xgpu-smi monitoring information.

You can run the xgpu-smi -h command to view the help information about the xgpu-smi tool.