Updated on 2025-09-04 GMT+08:00

Manual NPU Virtualization

In CCE, manual NPU virtualization enables node-level segmentation of NPUs, allowing manual control over resource allocation per NPU. This approach offers greater flexibility. This approach requires complex configurations and is best suited for scenarios that demand precise NPU resources, such as services requiring dedicated compute or strict isolation guarantees.

Prerequisites

  • There are NPU chips that support virtualization in the cluster. For details about the product types, see Supported NPU Chip Types.
  • The CCE AI Suite (Ascend NPU) add-on of v2.1.15 or later has been installed in the cluster. For details, see CCE AI Suite (Ascend NPU).
  • An NPU driver has been installed on the NPU nodes, and the driver version is 23.0.1 or later. To upgrade a driver, perform the following operations:
    • To upgrade a driver, ensure that the NPU firmware is available on the node. Reinstalling the driver will restart the node. You are advised to drain the node before installing the driver. For details, see Draining a Node. VMs do not support firmware upgrade.
    • To install a driver for all users in the OS during a driver upgrade, use the --install-for-all parameter together. You can use, for example, ./Ascend-hdk-310p-npu-driver_x.x.x_linux-{arch}.run --full --install-for-all.
    • If a driver upgrade fails, see "What Can I Do If an NPU Driver Fails to Be Upgraded?" in FAQs > "Chart and Add-on".
    1. Uninstall the original NPU driver. For details, see Uninstalling the NPU Driver.
    2. Go to Firmware and Drivers, select the corresponding product model, and download the driver installation package (in .run format) of 23.0.1 or later.
    3. Read Before You Start to learn about the restrictions and requirements for NPU installation, and install the NPU by referring to Installing the Driver (.run).

Notes and Constraints

  • CCE has only verified the virtualization of Ascend Snt3P3. .
  • Ascend AI products are delivered with virtualization templates. NPUs can only be virtualized based on these predefined template specifications. For details, see Virtualization Templates. Ascend AI products support flexible combinations of virtual instances. An NPU chip can be virtualized in to multiple vNPUs using different virtualization templates. The total resources used by each vNPU cannot exceed the physical limit of the NPU. The recommended specifications are provided. You can flexibly combine them as required. For details, see Virtual Instance Specifications.

Step 1: Create a vNPU

CCE standard and Turbo clusters allow you to create vNPUs as needed.

  1. Log in to the target node and access the cluster using kubectl. For details, see Accessing a Cluster Using kubectl.
  2. Obtain the basic information of the node. The information obtained by running the following command includes the NPU driver version, chip model, and resource usage, which serve as a basis for vNPU specification planning:

    npu-smi info

    The command output shows that the driver version is 24.1.rc2.3, there are two NPUs, each with one NPU chip, and the device IDs of the NPUs are 104 and 112.

    +--------------------------------------------------------------------------------------------------------+ 
    | npu-smi 24.1.rc2.3                               Version: 24.1.rc2.3                                   |
    +-------------------------------+-----------------+------------------------------------------------------+ 
    | NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
    | Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
    +===============================+=================+======================================================+ 
    | 104     xxx                 | OK              | NA           58                0     / 0             |
    | 0       0                     | 0000:00:0D.0    | 0            1782 / 21527                            |
    +===============================+=================+======================================================+ 
    | 112     xxx                 | OK              | NA           53                0     / 0             |
    | 0       1                     | 0000:00:0E.0    | 0            1786 / 21527                            |
    +===============================+=================+======================================================+
    +-------------------------------+-----------------+------------------------------------------------------+ 
    | NPU     Chip                  | Process id      | Process name             | Process memory(MB)        |
    +===============================+=================+======================================================+
    | No running processes found in NPU 104                                                                  |
    +===============================+=================+======================================================+ 
    | No running processes found in NPU 112                                                                  |
    +===============================+=================+======================================================+

  3. Check the virtualization templates supported by the current node and their respective resource specifications. You can split vNPUs using one or more templates.

    npu-smi info -t template-info

    The following information is displayed. vir01, vir02, vir02_1c, and other similar names are virtualization templates. The available templates vary by product. The following example is for reference only.

    +------------------------------------------------------------------------------------------+ 
    |NPU instance template info is:                                                            |
    |Name                AICORE    Memory    AICPU     VPC            VENC           JPEGD     |
    |                               GB                 PNGD           VDEC           JPEGE     |
    |==========================================================================================|
    |vir01               1         3         1         1              0              2         |
    |                                                  0              1              1         |
    +------------------------------------------------------------------------------------------+ 
    |vir02               2         6         2         3              1              4         |
    |                                                  0              3              2         |
    +------------------------------------------------------------------------------------------+ 
    |vir02_1c            2         6         1         3              0              4         |
    |                                                  0              3              2         |
    +------------------------------------------------------------------------------------------+ 
    |vir04               4         12        4         6              2              8         |
    |                                                  0              6              4         |
    +------------------------------------------------------------------------------------------+ 
    |vir04_3c            4         12        3         6              1              8         |
    |                                                  0              6              4         |
    +------------------------------------------------------------------------------------------+ 
    |vir04_3c_ndvpp      4         12        3         0              0              0         |
    |                                                  0              0              0         |
    +------------------------------------------------------------------------------------------+ 
    |vir04_4c_dvpp       4         12        4         12             3              16        |
    |                                                  0              12             8         |
    +------------------------------------------------------------------------------------------+

  4. Run the npu-smiset-tcreate-vnpu-i<id>-c<chip_id>-f<vnpu_config>[-vnpu_id][-gvgroup_id] command to create a vNPU.

    npu-smi set -t create-vnpu -i 104 -c 0 -f vir02
    Table 1 Parameters in this command

    Parameter

    Example Value

    Description

    id

    104

    Device ID (which is the NPU ID).

    How to obtain: Run the npu-smi info -l command to obtain the NPU ID.

    chip_id

    0

    ID of the NPU chip.

    How to obtain: Run the npu-smi info -m command to obtain the chip ID.

    vnpu_config

    vir02

    Name of the virtualization template. For details, see 3.

    vnpu_id

    -

    (Optional) ID of the vNPU to be created.

    vgroup_id

    -

    (Optional) ID of the virtual resource group (vGroup). The value ranges from 0 to 3. This parameter is only available for Atlas inference products.

    For details about vGroup, see Virtualization Modes.

    If information similar to the following is displayed, the vNPU has been created:

    Status                         : OK        
    Message                        : Create vnpu success

  5. Configure the vNPU configuration recovery. After the configuration is complete, the system can save the vNPU configuration when the node is restarted, so that the vNPU is still valid after a restart.

    1. Run the following command to enable the vNPU configuration recovery: If 1 in the command changes to 0, the vNPU configuration recovery is disabled.
      npu-smi set -t vnpu-cfg-recover -d 1

      Information similar to the following is displayed:

      Status : OK
      Message : The VNPU config recover mode Enable is set successfully.
    2. Run the following command to check whether the vNPU configuration recovery is enabled:
      npu-smi info -t vnpu-cfg-recover

      If information similar to the following is displayed, the function has been enabled:

      VNPU config recover mode : Enable

  6. Run the following command to view the created vNPU and the remaining resources in the NPU chip: In this command, 104 indicates the NPU ID, and 0 indicates the chip ID. Replace them as required.

    npu-smi info -t info-vnpu -i 104 -c 0
    The command output shows that one vNPU is created from template vir02 and its ID is 100. The remaining resources (such as AI Cores and memory) in the NPU chip and the resources in the vir02 template are equal to the total physical resources of the NPU chip. During NPU virtualization, the sum of a certain type of resources used by all vNPUs on an NPU chip cannot exceed the physical resources on the NPU chip.
    +-------------------------------------------------------------------------------+ 
    | NPU resource static info as follow:                                           |
    | Format:Free/Total                   NA: Currently, query is not supported.    |
    | AICORE    Memory    AICPU    VPC    VENC    VDEC    JPEGD    JPEGE    PNGD    |
    |            GB                                                                 |
    |===============================================================================|
    | 6/8       15/21     5/7      9/12   2/3     9/12    12/16    6/8      NA/NA   |
    +-------------------------------------------------------------------------------+ 
    | Total number of vnpu: 1                                                       |
    +-------------------------------------------------------------------------------+
    |  Vnpu ID  |  Vgroup ID     |  Container ID  |  Status  |  Template Name       |
    +-------------------------------------------------------------------------------+ 
    |  100      |  0             |  000000000000  |  0       |  vir02               |
    +-------------------------------------------------------------------------------+

Step 2: Restart the Component and Check Resource Reporting

After a vNPU is created, restart the huawei-npu-device-plugin component on the node to report NPU resources to Kubernetes.

  1. Run the following command to query all the pods for running the huawei-npu-device-plugin component:

    kubectl get pods -A -o wide | grep huawei-npu-device-plugin

    Below is the command output. The IP addresses in bold indicate that the component is running on these nodes. Delete the pods based on the node IP address to restart the huawei-npu-device-plugin component on the node. In this example, the pod is deleted from the node whose IP address is 192.168.2.27.

    kube-system   huawei-npu-device-plugin-8lq64            1/1     Running   2 (4d7h ago)   4d8h   192.168.0.9     192.168.0.9     <none>           <none>
    kube-system   huawei-npu-device-plugin-khkvr            1/1     Running   0              4d8h   192.168.0.131   192.168.0.131   <none>           <none>
    kube-system   huawei-npu-device-plugin-rltx4            1/1     Running   0              4d8h   192.168.7.56    192.168.7.56    <none>           <none>
    kube-system   huawei-npu-device-plugin-t9vxx            1/1     Running   1 (4d8h ago)   4d8h   192.168.0.72    192.168.0.72    <none>           <none>
    kube-system   huawei-npu-device-plugin-c6x7            1/1     Running   0              3d2h   192.168.2.27    192.168.2.27    <none>           <none>

  2. Run the following command to delete the pod:

    kubectl delete pod -n kube-system huawei-npu-device-plugin-c6x7

    If information similar to the following is displayed, the pod has been deleted:

    pod "huawei-npu-device-plugin-c6x7" deleted

  3. Run the following command to query the reported vNPU resources: After NPU virtualization, only the created vNPUs are reported as available, and the remaining resources cannot be reported to Kubernetes for use.

    kubectl describe node 192.168.2.27

    The command output shows that both the number of the cards and the number of vNPUs are 1. This indicates that one NPU chip has been virtualized and the other is not.

    ... ... 
    Capacity: 
      cpu:                       32
      ephemeral-storage:         102683576Ki
      huawei.com/ascend-310:     1   # The number of NPUs
      huawei.com/ascend-310-2c:  1   # The number of vNPUs
      hugepages-1Gi:             0
      hugepages-2Mi:             0
      localssd:                  0
      localvolume:               0
      memory:                    131480656Ki
      pods:                      110
    Allocatable: 
      cpu:                       31850m 
      ephemeral-storage:         94633183485 
      huawei.com/ascend-310:     1   # The number of NPUs
      huawei.com/ascend-310-2c:  1   # The number of vNPUs
      hugepages-1Gi:             0
      hugepages-2Mi:             0
      localssd:                  0
      localvolume:               0
      memory:                    126616656Ki 
      pods:                      110
    ... ...

Step 3: Use the Created vNPU

After vNPUs are created, you can use YAML to specify the vNPU resources for workloads to efficiently manage and flexibly configure resources. If you need to use the Volcano Scheduler, its version must be 1.12.1 or later.

  1. Create a workload and request vNPU resources using the vir02 template.

    1. Create a YAML file named vnpu-worker.
      vim vnpu-worker.yaml

      Containers can request NPU or vNPU resources. The two types of resources cannot be used at the same time.

      Before using a vNPU, ensure that it has been created. If a vNPU is not created, an error is reported, for example, "0/2 nodes are available: 2 Insufficient huawei.com/ascend-310-2c".

      kind: Deployment
      apiVersion: apps/v1
      metadata:
        name: vnpu-test
        namespace: default
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: vnpu-test
        template:
          metadata:
            labels:
              app: vnpu-test
          spec:
            schedulerName: kube-scheduler    # If the workload requires Volcano Scheduler, install the add-on and ensure that the add-on version is v1.12.1 or later.
            containers:
              - name: container-0
                image: nginx:latest
                resources:
                  limits:
                    cpu: 250m
                    huawei.com/ascend-310-2c: '1'   # The number of vNPUs to be requested. The value is fixed at 1.
                    memory: 512Mi
                  requests:
                    cpu: 250m
                    huawei.com/ascend-310-2c: '1'   # The value is fixed at 1.
                    memory: 512Mi
      • The container only requests one vNPU, meaning that the number of vNPUs in both requests and limits is fixed at 1.
      • The vNPU must be created on the node in advance, and there must be sufficient resources. If the vNPU resources are insufficient, an error message similar to "0/2 nodes are available: 2 Insufficient huawei.com/ascend-310-2c." is displayed.
      • huawei.com/ascend-310-2c indicates the name of the requested vNPU. The vNPU name varies depending on the product and template. You can refer to the following table to obtain the vNPU name.
        Table 2 vNPU names in different products

        Product Type

        Virtualization Template

        vNPU Name

        Atlas inference series (eight AI Cores)

        vir01

        huawei.com/ascend-310-1c

        vir02

        huawei.com/ascend-310-2c

        vir02_1c

        huawei.com/ascend-310-2c.1cpu

        vir04

        huawei.com/ascend-310-4c

        vir04_3c

        huawei.com/ascend-310-4c.3cpu

        vir04_3c_ndvpp

        huawei.com/ascend-310-4c.3cpu.ndvpp

        vir04_4c_dvpp

        huawei.com/ascend-310-4c.4cpu.dvpp

        Ascend training series (30 or 32 AI Cores)

        vir16

        huawei.com/ascend-1980-16c

        vir08

        huawei.com/ascend-1980-8c

        vir04

        huawei.com/ascend-1980-4c

        vir02

        huawei.com/ascend-1980-2c

    2. Run the following command to create the workload:
      kubectl apply -f vnpu-worker.yaml

      Information similar to the following is displayed:

      deployment/vnpu-test created
    3. Run the following command to check whether the pod is running:
      kubectl get pod | grep vnpu-test

      If the following information is displayed, the pod for the workload is running normally:

      vnpu-test-6658cd795b-rx76t      1/1     Running     0       59m

  2. Run the following command to enter the container:

    kubectl -n default exec -it vnpu-test-6658cd795b-rx76t -c container-0 -- /bin/bash

  3. Check whether the vNPU is mounted to the container.

    1. Run the following command to use an environment variable to specify the search path of the dynamic link library (DLL), which ensures that the system can correctly load the required DLL file when running NPU-related applications:
      export LD_LIBRARY_PATH=/usr/local/HiAI/driver/lib64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/driver/lib64 
    2. Run the following command to view the vNPU mounted to the container:
      npu-smi info

      The command output indicates that vNPU whose ID is 104 has been mounted to the container. The virtualization template is vir02.

      +--------------------------------------------------------------------------------------------------------+
      | npu-smi 24.1.rc2.3                               Version: 24.1.rc2.3                                   |
      +-------------------------------+-----------------+------------------------------------------------------+
      | NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
      | Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
      +===============================+=================+======================================================+
      | 104     xxx             | OK              | NA           54                0     / 0             |
      | 0       0                     | 0000:00:0D.0    | 0            445  / 5381                             |
      +===============================+=================+======================================================+
      +-------------------------------+-----------------+------------------------------------------------------+
      | NPU     Chip                  | Process id      | Process name             | Process memory(MB)        |
      +===============================+=================+======================================================+
      | No running processes found in NPU 104                                                                  |
      +===============================+=================+======================================================+

Step 4: Destroy the Created vNPU

Destroy a vNPU when it is no longer used to release related resources. Before destroying a vNPU, ensure that no job is using the vNPU. If the vNPU is being used, the vNPU fails to be destroyed.

  1. Run the npu-smi set -t destroy-vnpu -i <id>-c <chip_id> -v <vnpu_id> command to destroy the vNPU.

    npu-smi set -t destroy-vnpu -i 104 -c 0 -v 100
    • If information similar to the following is displayed, the command is executed successfully:
      Status                         : OK 
      Message                        : Destroy vnpu 100 success
    • If the following information is displayed, there are jobs using the vNPU to be destroyed. Ensure that no job is using the vNPU to be destroyed and run the command again.
      destroy vnpu 100 failed.
      Usage: npu-smi set -t destroy-vnpu [Options...] 
      Options: 
             -i %d              Card ID     
             -c %d              Chip ID  
             -v %d              Vnpu ID

  2. Restart the huawei-npu-device-plugin component on the corresponding node to report the information to Kubernetes. To do so, take the following steps:

    1. Run the following command to query all the pods for running the huawei-npu-device-plugin component:
      kubectl get pods -A -o wide | grep huawei-npu-device-plugin

      Below is the command output. The IP addresses in bold indicate that the component is running on these nodes. Delete the pods based on the node IP address to restart the huawei-npu-device-plugin component on the node. In this example, the pod is deleted from the node whose IP address is 192.168.2.27.

      kube-system   huawei-npu-device-plugin-8lq64            1/1     Running   2 (4d7h ago)   4d8h   192.168.0.9     192.168.0.9     <none>           <none>
      kube-system   huawei-npu-device-plugin-khkvr            1/1     Running   0              4d8h   192.168.0.131   192.168.0.131   <none>           <none>
      kube-system   huawei-npu-device-plugin-rltx4            1/1     Running   0              4d8h   192.168.7.56    192.168.7.56    <none>           <none>
      kube-system   huawei-npu-device-plugin-t9vxx            1/1     Running   1 (4d8h ago)   4d8h   192.168.0.72    192.168.0.72    <none>           <none>
      kube-system   huawei-npu-device-plugin-tcmck            1/1     Running   0              3d2h   192.168.2.27    192.168.2.27    <none>           <none>
    2. Run the following command to delete the pod:
      kubectl delete pod -n kube-system huawei-npu-device-plugin-tcmck

      If information similar to the following is displayed, the pod has been deleted.

      pod "huawei-npu-device-plugin-tcmck" deleted

  3. Run the following command to check whether the vNPU is destroyed: If the number of cards is restored, the vNPU has been destroyed:

    kubectl describe node 192.168.2.27

    The command output shows that the number of cards is restored to 2 and the number of vNPUs is 0, indicating that the vNPU has been destroyed.

    ... ... 
    Capacity: 
      cpu:                       32
      ephemeral-storage:         102683576Ki
      huawei.com/ascend-310:     2
      huawei.com/ascend-310-2c:  0
      hugepages-1Gi:             0
      hugepages-2Mi:             0
      localssd:                  0
      localvolume:               0
      memory:                    131480656Ki
      pods:                      110
    Allocatable: 
      cpu:                       31850m 
      ephemeral-storage:         94633183485 
      huawei.com/ascend-310:     2
      huawei.com/ascend-310-2c:  0
      hugepages-1Gi:             0
      hugepages-2Mi:             0
      localssd:                  0
      localvolume:               0
      memory:                    126616656Ki 
      pods:                      110
    ... ...