Overview
CCE uses xGPU virtualization technologies to dynamically divide the GPU memory and computing power. A single GPU can be virtualized into up to 20 virtual GPU devices. Virtualization is more flexible than static allocation. You can specify the number of GPUs on the basis of stable service running to improve GPU utilization.
Advantages
The GPU virtualization function of CCE has the following advantages:
- Flexible: The GPU computing power ratio and GPU memory size are configured in a refined manner. The computing power allocation granularity is 5% GPU, and the GPU memory allocation granularity is MB.
- Isolated: A single GPU memory can be isolated and both the computing power and GPU memory can also be isolated at the same time.
- Compatible: Services do not need to be recompiled or the CUDA library does not need to be replaced.
Prerequisites
Item |
Supported Version |
---|---|
Cluster version |
v1.23.8-r0, v1.25.3-r0, or later |
OS |
Huawei Cloud EulerOS 2.0 |
GPU type |
T4 and V100 |
Driver version |
470.57.02, 510.47.03, and 535.54.03 |
Runtime |
containerd |
Add-on |
The following add-ons must be installed in the cluster:
|
Constraints
- A single GPU can be virtualized into a maximum of 20 xGPU devices.
- After GPU virtualization is used, init containers are not supported.
- GPU virtualization supports two isolation modes: GPU memory isolation and isolation between GPU memory and computing power. A single GPU can schedule only workloads in the same isolation mode.
- Autoscaler cannot be used to automatically scale in or out GPU nodes.
- xGPU isolation does not allow you to request for GPU memory by calling CUDA API cudaMallocManaged(), which is also known as using UVM. For more information, see NVIDIA official documents. Use other methods to request for GPU memory, for example, by calling cudaMalloc().
- When a containerized application is initializing, the real-time compute monitored by the nvidia-smi may exceed the upper limit of the available compute of the container.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot