Updated on 2024-11-06 GMT+08:00

CCE AI Suite (Ascend NPU)

Add-on Overview

CCE AI Suite (Ascend NPU) is a device management add-on that supports NPUs in containers.

After this add-on is installed, you can create AI-accelerated nodes to quickly and efficiently process inference and image recognition.

Add-on Parameters

Table 1 Parameters

Parameter

Mandatory

Type

Description

basic

No

object

Basic configuration parameters, which do not need to be specified

flavor

Yes

Table 3 object

Flavor parameters

custom

Yes

Table 4 object

Custom parameters

Table 2 Configuration of basic

Parameter

Mandatory

Type

Description

cluster_version

Yes

String

CCE cluster version

device_version

Yes

String

Add-on version

driver_version

Yes

String

Image tag of an add-on pod where the driver is installed when automatic driver installation is enabled for the add-on. Generally, the value is the same as that of device_version.

swr_addr

Yes

String

Image repository address

swr_user

Yes

String

Tenant path of an image repository

Table 3 Configuration of flavor

Parameter

Mandatory

Type

Description

description

No

String

Add-on description

name

Yes

String

Add-on specification name. The value is fixed at Single-instance.

replicas

Yes

String

Number of pods. The default value is 1.

resources

Yes

resources object

Container resource (CPU and memory) quotas

Table 4 Configuration of custom

Parameter

Mandatory

Type

Description

auto_install_npu_driver

No

Bool

Default value: false

true: The NPU driver is automatically installed on a node. Only some specifications of 310 and 310P cards are supported.

check_frequency_failed_threshold

No

Int

Threshold for the add-on to check how many times an NPU device is considered unhealthy

Default value: 100

check_frequency_fall_times

No

Int

Threshold for the add-on to check whether to isolate a chip when the dominant frequency of the chip is reduced

Default value: 3

check_frequency_gate

No

Bool

true: Checks on the chip dominant frequency are enabled.

Default value: false

check_frequency_recover_threshold

No

Int

Threshold for the add-on to check how many times an NPU device is considered healthy

Default value: 100

check_frequency_rise_times

No

Int

Threshold for the add-on to check whether the chip dominant frequency is restored

Default value: 2

container_path

No

String

Path for mounting the Hiai Library in a container

Default value: "/usr/local/HiAI_unused"

host_path

No

String

Path containing the Hiai library on a host

Default value: "/usr/local/HiAI_unused"

npu_driver_config

No

Map

If an NPU driver is automatically installed on a node, the key of this parameter specifies the driver model, and the value specifies the address for downloading the NPU driver of that model.

Default value: {}

Table 5 Data structure of the resources field

Parameter

Mandatory

Type

Description

limitsCpu

Yes

String

CPU size limit (unit: m)

Default value: 1000m

limitsMem

Yes

String

Memory size limit (unit: Mi)

Default value: 4096Mi

name

Yes

String

Add-on name. The value is fixed at npu-driver-installer.

requestsCpu

Yes

String

Requested CPU size (unit: m)

Default value: 50m

requestsMem

Yes

String

Requested memory size (unit: Mi)

Default value: 100Mi

Example Request

{
  "kind": "Addon",
  "apiVersion": "v3",
  "metadata": {
    "name": "huawei-npu",
  },
  "spec": {
    "clusterID": "e93c2716-****-****-****-0255ac10004e",
    "version": "2.0.26",
    "addonTemplateName": "huawei-npu",
    "values": {
      "basic": {
        "cluster_version": "v1.23",
        "device_version": "2.0.26",
        "driver_version": "2.0.26",
        "platform": "linux-amd64",
        "rbac_enabled": true,
        "swr_addr": "***",
        "swr_user": "***"
      },
      "custom": {
        "annotations": {},
        "auto_install_npu_driver": true,
        "check_frequency_failed_threshold": 100,
        "check_frequency_fall_times": 3,
        "check_frequency_gate": false,
        "check_frequency_recover_threshold": 100,
        "check_frequency_rise_times": 2,
        "container_path": "/usr/local/HiAI_unused",
        "host_path": "/usr/local/HiAI_unused",
        "npu_driver_config": {}
      },
      "flavor": {
        "category": [
          "CCE",
          "Turbo"
        ],
        "name": "default",
        "resources": [
          {
            "limitsCpu": "1000m",
            "limitsMem": "4096Mi",
            "name": "npu-driver-installer",
            "requestsCpu": "50m",
            "requestsMem": "100Mi"
          }
        ]
      },
    }
  }
}