Updated on 2024-12-18 GMT+08:00

huawei-npu

Introduction

huawei-npu supports and manages Huawei NPUs in containers.

After this add-on is installed, you can create NPU nodes to enable quick, efficient inference and image recognition.

Prerequisites

  • You have added the accelerator/huawei-npu label to the node where huawei-npu to be installed. The label value can be empty.
  • To make this add-on run on an Ascend Snt9 device, you need to install Volcano first.

Constraints

  • This add-on can only be installed in on-premises clusters v1.28 or later.
  • Only Arm and Huawei Cloud EulerOS 2.0 are supported.
  • Only Ascend Snt9 NPUs are supported.
  • Ascend Snt9 devices require the use of Volcano, and each container supports only 1, 2, 4, or 8 NPUs for scheduling.

Installing the Add-on

  1. Log in to the UCS console and choose Fleets. Then, click the cluster name to access the cluster console. In the navigation pane, choose Add-ons. On the displayed page, locate huawei-npu and click Install.
  2. Configure the NPU parameters. You are advised to retain the default values, which can satisfy most scenarios and require no changes.
  3. Click Install.

    Figure 1 Installing huawei-npu

  • Before installing huawei-npu, ensure that Volcano has been installed.
  • After the NPU driver is installed on a node, restart that node for the driver to take effect. For details about how to check whether the driver is installed, see How to Check Whether the NPU Driver Has Been Installed on a Node.
  • Uninstalling this add-on does not automatically delete the installed NPU driver. You need to manually uninstall the NPU driver to delete related resources.

Upgrading the Add-on

  1. Log in to the UCS console and choose Fleets. Click the cluster name to access the cluster console. In the navigation pane, choose Add-ons.
  2. Locate huawei-npu in Add-ons Installed. If there is "New version available" next to the version label, click Upgrade.
  3. Configure basic information and select the version.
  4. Click Upgrade.

Uninstalling the Add-on

  1. Log in to the UCS console and choose Fleets. Click the cluster name to access the cluster console. In the navigation pane, choose Add-ons.
  2. Locate huawei-npu in Add-ons Installed and click Uninstall.
  3. In the displayed dialog box, click Yes.

Installing an Ascend NPU Driver

Ensure that the Ascend NPU has been allocated to a node, confirm the device model, download the driver from the Ascend official community, and install it by referring to the installation guide.

After the installation is complete, run the following command to check all chips in the /dev directory of the node:

ls -l /dev/davinci*

Run the following command to check whether the driver is loaded:

npu-smi info

If information similar to the following is displayed, the driver has been loaded successfully. Otherwise, the driver failed to load. If the driver failed to load, you can contact Huawei technical support.

How to Check Whether the NPU Driver Has Been Installed on a Node

After ensuring that the driver is successfully installed on a node, restart that node for the driver to take effect. Otherwise, the driver cannot take effect and NPU resources are unavailable. To check whether the driver is installed, perform the following operations:

Log in to the UCS console and choose Fleets. Then, click the cluster name to access the cluster console. In the navigation pane, choose Add-ons. On the displayed page, click the add-on name to view the add-on instance list. Each instance is in the Running state.

If the node is restarted before the NPU driver is installed, the driver installation may fail, and a message is displayed on the Nodes page indicating that the Ascend driver is not ready. In this case, uninstall the NPU driver from the node and restart the node to reinstall the NPU driver. After confirming that the driver is installed, restart the node.