Installing and Using xGPU

This section describes how to install and use xGPU.

Constraints

xGPU is supported only on NVIDIA Tesla T4 and V100.
The HCE OS kernel version is 5.10 or later.
NVIDIA driver 535.54.03 has been installed on GPU-accelerated ECSs.
Docker 18.09.0-300 or later has been installed on GPU-accelerated ECSs.
Limited by GPU virtualization, when applications in a container are initialized, the GPU compute monitored by NVIDIA System Management Interface (nvidia-smi) may exceed the upper limit of the GPU compute available for the container.
When the first CUDA application is created, a percentage of the GPU memory (about 3 MiB on NVIDIA Tesla T4) is requested from each GPU. This GPU memory is considered as the management overhead and is not controlled by the xGPU service.
xGPU cannot be used on bare metal servers and VMs configured with a passthrough NIC at the same time.
GPU memory isolation provided by xGPU cannot be used for GPU memory applications through unified virtual memory (UVM) by calling CUDA API cudaMallocManaged(). For more information, see NVIDIA official document. Use other methods to request GPU memory. For example, call cudaMalloc().

xGPU allows users to apply for GPU memory by dynamically disabling the UVM. For details about how to disable the UVM, see the description of the uvm_disable interface.

Installing xGPU

To install xGPU, contact customer service.

You are advised to use xGPU through CCE. For details, see GPU Virtualization.

Using xGPU

The following table describes the environment variables of xGPU. When creating a container, you can specify the environment variables to configure the GPU compute and memory that a container engine can obtain using xGPU.

**Table 1** Environment variables that affect xGPU
Environment Variable	Value Type	Description	Example
GPU_IDX	Integer	Specifies the GPU that can be used by a container.	Assigning the first GPU to a container: GPU_IDX=0
GPU_CONTAINER_MEM	Integer	Specifies the size of the GPU memory that will be used by a container, in MiB.	Allocating 5,120 MiB of the GPU memory to a container: GPU_CONTAINER_MEM=5120
GPU_CONTAINER_QUOTA_PERCENT	Integer	Specifies the percentage of the GPU compute that will be allocated to a container from a GPU. The GPU compute is allocated at a granularity of 1%. It is recommended that the minimum GPU compute be greater than or equal to 4%.	Allocating 50% of the GPU compute to a container: GPU_CONTAINER_QUOTA_PERCEN=50
GPU_POLICY	Integer	Specifies the GPU compute isolation policy used by the GPU. 0: native scheduling. The GPU compute is not isolated. 1: fixed scheduling 2: average scheduling 3: preemptive scheduling 4: weight-based preemptive scheduling 5: hybrid scheduling 6: weighted scheduling For details, see GPU Compute Scheduling Examples.	Setting the GPU compute isolation policy to fixed scheduling: GPU_POLICY=1
GPU_CONTAINER_PRIORITY	Integer	Specifies the priority of a container. 0: low priority 1: high priority	Creating a high-priority container: GPU_CONTAINER_PRIORITY=1

The following example describes how you can use xGPU to allow two NVIDIA-Docker containers to share one GPU.

**Table 2** Data planning
Parameter	Container 1	Container 2	Description
GPU_IDX	0	0	Two containers use the first GPU.
GPU_CONTAINER_QUOTA_PERCENT	50	30	Allocate 50% of GPU compute to container 1 and 30% of GPU compute to container 2.
GPU_CONTAINER_MEM	5120	1024	Allocate 5,120 MiB of the GPU memory to container 1 and 1,024 MiB of the GPU memory to container 2.
GPU_POLICY	1	1	Set the first GPU to use fixed scheduling.
GPU_CONTAINER_PRIORITY	1	0	Give container 1 a high priority and container 2 a low priority.

Example:

docker run --rm -it --runtime=nvidia -e GPU_CONTAINER_QUOTA_PERCENT=50 -e GPU_CONTAINER_MEM=5120 -e GPU_IDX=0 -e GPU_POLICY=1 -e GPU_CONTAINER_PRIORITY=1 --shm-size 16g -v /mnt/:/mnt nvcr.io/nvidia/tensorrt:19.07-py3 bash 
docker run --rm -it --runtime=nvidia -e GPU_CONTAINER_QUOTA_PERCENT=30 -e GPU_CONTAINER_MEM=1024 -e GPU_IDX=0 -e GPU_POLICY=1 -e GPU_CONTAINER_PRIORITY=0 --shm-size 16g -v /mnt/:/mnt nvcr.io/nvidia/tensorrt:19.07-py3 bash

Viewing procfs Directory

At runtime, xGPU generates and manages multiple proc file systems (procfs) in the /proc/xgpu directory. The following are operations for you to view xGPU information and configure xGPU settings.

Run the following command to view the node information:

ls /proc/xgpu/
0 container version uvm_disable

The following table describes the information about the procfs nodes.

Directory	Read/Write Type	Description
0	Read and write	xGPU generates a directory for each GPU on a GPU-accelerated ECS. Each directory uses a number as its name, for example, 0, 1, and 2. In this example, there is only one GPU, and the directory ID is 0.
container	Read and write	xGPU generates a directory for each container running in the GPU-accelerated ECSs.
version	Read-only	xGPU version.
uvm_disable	Read and write	Controls whether to disable the option that allows all containers to request GPU memory through UVM. The default value is 0. 0: This option is enabled. 1: This option is disabled.

Run the following command to view the items in the directory of the first GPU:

ls /proc/xgpu/0/
max_inst meminfo policy quota utilization_line utilization_rate xgpu1 xgpu2

The following table describes the information about the GPU.

Directory	Read/Write Type	Description
max_inst	Read and write	Specifies the maximum number of containers. The value ranges from 1 to 25. This parameter can be modified only when no container is running.
meminfo	Read-only	Specifies the total available GPU memory.
policy	Read and write	Specifies the GPU compute isolation policy used by the GPU. The default value is 1. 0: native scheduling. The GPU compute is not isolated. 1: fixed scheduling 2: average scheduling 3: preemptive scheduling 4: weight-based preemptive scheduling 5: hybrid scheduling 6: weighted scheduling For details, see GPU Compute Scheduling Examples.
quota	Read-only	Specifies the total weight of the GPU compute.
utilization_line	Read and write	Specifies the GPU usage threshold for online containers to suppress offline containers. If the GPU usage exceeds the value of this parameter, online containers completely suppress offline containers. If the GPU usage does not exceed the value of this parameter, online containers partially suppress offline containers.
utilization_rate	Read only	Specifies the GPU usage.
xgpuIndex	Read and write	Specifies the xgpu subdirectory of the GPU. In this example, xgpu1 and xgpu2 belong to the first GPU.

Run the following command to view the directory of a container:

ls /proc/xgpu/container/
9418 9582

The following table describes the content in the directory.

Directory	Read/Write Type	Description
containerID	Read and write	Specifies the container ID. xGPU generates an ID for each container during container creation.

Directory	Read/Write Type	Description
xgpuIndex	Read and write	Specifies the xgpu subdirectory of the container. In this example, xgpu1 is the subdirectory of container 9418, and xgpu2 is the subdirectory of container 9582.
uvm_disable	Read and write	Controls whether to disable the option that only allows a container to request GPU memory through UVM. The default value is 0. 0: This option is enabled. 1: This option is disabled.

Directory	Read/Write Type	Description
meminfo	Read-only	Specifies the GPU memory allocated using xGPU and the remaining GPU memory. For example, 3308MiB/5120MiB, 64% free indicates that 5,120 MiB of memory is allocated and 64% is available.
priority	Read and write	Specifies the priority of a container. The default value is 0. 0: low priority 1: high priority This parameter is used in hybrid scenarios where there are both online and offline containers. High-priority containers can preempt the GPU compute of low-priority containers.
quota	Read-only	Specifies the percentage of the GPU compute allocated using xGPU. For example, 50 indicates that 50% of the GPU compute is allocated using xGPU.

Command	Description
echo 1 > /proc/xgpu/0/policy	Changes the scheduling policy to weight-based preemptive scheduling for the first GPU.
cat /proc/xgpu/container/$containerID/$xgpuIndex/meminfo	Queries the memory allocated to a container.
cat /proc/xgpu/container/$containerID/$xgpuIndex/quota	Queries the GPU compute weight specified for a container.
cat /proc/xgpu/0/quota	Queries the weight of the remaining GPU compute on the first GPU.
echo 20 > /proc/xgpu/0/max_inst	Sets the maximum number of containers allowed on the first GPU to 20.
echo 1 > /proc/xgpu/container/$containerID/$xgpuIndex/priority	Sets the xGPU in a container to high priority.
echo 40 > /proc/xgpu/0/utilization_line	Sets the suppression threshold of the first GPU to 40%.
echo 1 > /proc/xgpu/container/$containerID/uvm_disable	Controls whether to disable the option that allows containers to use xGPU to request GPU memory through UVM.

Upgrading xGPU

You can perform cold upgrades on xGPU.

Run the following command to stop all running containers:
docker ps -q | xargs -I {} docker stop {}
Run the following command to upgrade the RPM package of xGPU:
rpm -U hce_xgpu

Uninstalling xGPU

Run the following command to stop all running containers:
docker ps -q | xargs -I {} docker stop {}
Run the following command to uninstall the RPM package:
rpm -e hce_xgpu