Heterogeneous Resources
GPU Settings
- GPU Virtualization: CCE uses proprietary xGPU virtualization to dynamically divide the GPU memory and compute. A single GPU can be virtualized into up to 20 virtual GPUs. Virtualization is more flexible than static allocation. You can specify the number of GPUs on the basis of stable services to improve GPU utilization. For details, see Overview.
- Default Cluster Driver: specifies the default GPU driver version used by the GPU nodes in a cluster. To use a custom driver, enter the download link of the NVIDIA driver. For details, see Internet.
- Node Pool Configurations: If you do not want all GPU nodes in a cluster to use the same driver, CCE allows you to install a different GPU driver for each node pool. After you customize a GPU driver for a node pool, nodes in the node pool will preferentially use the custom driver. Nodes for which no driver is specified will use the cluster's default driver.
- The system installs the driver of the version specified for the node pool. The driver applies only to new nodes in the node pool.
- After the driver version is updated, it takes effect on the nodes newly added to the node pool. Existing nodes must restart to apply the changes.
- When installing the CCE AI Suite (NVIDIA GPU) add-on of v2.7.2 or later, you can configure GPU virtualization for node pools.
NPU Settings
- Automatic Driver Installation
- If this function is disabled, creating an NPU node through the console will trigger CCE to automatically insert a default NPU driver installation command, but you cannot specify the driver version or type. After installation, the node is automatically restarted. However, if you create the node using APIs or other methods, you must manually add the NPU driver installation command to the post-installation script.
The supported NPU types and required OS specifications are listed in the table below.
Table 1 Specification adaptation NPU Type
Supported OS
Snt3 (ascend-snt3)
EulerOS 2.5 x86, CentOS 7.6 x86, EulerOS 2.9 x86, and EulerOS 2.8 Arm
NOTE:The Snt3 Arm model supports up to EulerOS 2.8 Arm, which has now reached EOS. For details, see EOS Plan.
CCE standard and Turbo clusters of v1.28 and later versions do not support EulerOS 2.8 Arm. To use NPUs in such clusters, select compatible NPUs by referring to Mappings Between Cluster Versions and OS Versions and Software Versions Required by Different Models. For details about the purchase process, see Lite Cluster Usage Process.
- After this function is enabled, CCE automatically installs the appropriate driver based on the configured model type when the CCE AI Suite (Ascend NPU) add-on starts.
- If this function is disabled, creating an NPU node through the console will trigger CCE to automatically insert a default NPU driver installation command, but you cannot specify the driver version or type. After installation, the node is automatically restarted. However, if you create the node using APIs or other methods, you must manually add the NPU driver installation command to the post-installation script.
- NPU Virtualization: NPU virtualization abstracts a physical NPU (Ascend AI product) into multiple virtual NPUs (vNPUs) so that it can be shared among containers. It enables flexible partitioning and dynamic management of hardware resources. For details, see Automatic NPU Virtualization (Computing Segmentation).
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot