Overview

NPU virtualization abstracts a physical NPU (Ascend AI product) into multiple virtual NPUs (vNPUs) so that it can be shared among containers. It enables flexible partitioning and dynamic management of hardware resources. NPU virtualization offers the following benefits:

Efficient resource utilization: A single server's NPUs can be divided into multiple vNPUs for different users, reducing computing costs.
Strong resource isolation: Containerization technologies enable vNPUs for different users to be completely isolated, preventing computing interference or data leak.
Unified management: The allocation and reclamation of resources of different specifications are streamlined, facilitating multi-tenant management.

Basic Rules of NPU Virtualization

The Ascend NPU hardware provides core computing resources, including AI cores, AI CPUs, and memory. Leveraging virtualization technologies, CCE can flexibly divide physical NPUs into multiple vNPUs based on your needs. Each vNPU contains a specific number of AI cores, AI CPUs, and memory. For example, if one container requests four AI cores and another requests two AI cores, CCE will allocate two vNPUs to meet these requests. For details, see Figure 1.

Figure 1 vNPU use case
Click to enlarge

Supported NPU Chip Types

CCE has only verified the virtualization of Ascend Snt3P3. .

To check the hardware data of a chip, log in to the target node and run the npu-smi info -t info-vnpu -i <id> -c <chip_id> command. Obtain the id and chip_id as follows:

id: The device ID or NPU ID, which can be obtained by running the npu-smi info -l command.
chip_id: The chip ID, which can be obtained by running the npu-smi info -m command.

NPU Virtualization Templates

During NPU virtualization, only official virtualization templates can be used to specify vNPU specifications. For details about the templates applicable to Ascend Snt3P3, see Table 1.

**Table 1** NPU virtualization templates
Chip Type	Template	AI Cores	Memory	AI CPU	VPC	VDEC	JPEGD	VENC	JPEGE
Ascend Snt3P3	vir04	4	12 GiB	4	6	6	8	2	4
	vir04_3c	4	12 GiB	3	6	6	8	1	4
	vir02	2	6 GiB	2	3	3	4	1	2
	vir02_1c	2	6 GiB	1	3	3	4	0	2
	vir01	1	3 GiB	1	1	1	2	0	1
	vir04_3c_ndvpp	4	12 GiB	3	0	0	0	0	0
	vir04_4c_dvpp	4	12 GiB	4	12	12	16	3	8

NPU virtualization supports flexible combinations of virtual instances. An NPU chip can be virtualized into multiple vNPUs using various virtualization templates, with the total resources used by each vNPU not exceeding the physical limits of the NPU. Recommended specifications are provided for flexible combination as needed. For details, see Virtual Instance Specifications.

NPU Virtualization Modes

CCE provides two NPU virtualization modes to choose from based on service needs.

Automatic NPU Virtualization (Computing Segmentation): NPUs are virtualized by node pool. You can split NPUs to vNPUs in batches through the UI. This mode supports only fixed virtualization template combinations and is suitable for large-scale unified resource management.
Manual NPU Virtualization: NPUs are virtualized by node. You can manually control the resource allocation of each NPU. This mode is more flexible but complex and is ideal for scenarios requiring fine-grained resource management.

Parent Topic: NPU Virtualization

Previous topic: NPU Virtualization

Next topic: Automatic NPU Virtualization (Computing Segmentation)