Updated on 2026-06-01 GMT+08:00

Configuring the NPU Driver Firmware Consistency and UDP Port Hash

Description

In the public OSs of the Lite Server Snt9b and supernode Snt9b23, configurations for driver-firmware consistency and UDP port hash have been implemented. The driver-firmware consistency configuration ensures that the firmware is automatically refreshed after the server is shut down and restarted, maintaining the consistency between the OS HDK driver and firmware. The UDP port hash configuration ensures that the uplink port configuration of the parameter plane network is automatically refreshed after the server is shut down and restarted, preventing congestion in the parameter plane network.

When using a privately built OS, if you do not configure driver-firmware consistency and UDP port hash, the system will lack the ability to automatically refresh after the server is powered off and restarted. Therefore, it is recommended that you configure driver-firmware consistency and UDP port hash on your private OS.

Constraints

You can only configure driver firmware consistency and port hash for Lite Server Snt9b and supernode Snt9b23.

The driver firmware consistency configuration and UDP port hash configuration depend on bms-network-config. You need to first complete the cloud-based adaptation of the OS and install the software package. For details, see the BMS image creation documentation.

Configuring Driver Firmware Consistency

Obtain the required NPU driver and firmware packages based on your model and chip architecture from Huawei Support. HDK25.2.1 is used as an example to describe how to configure driver firmware consistency on an Arm64-powered Snt9b23 supernode.

Table 1 Driver and firmware version mapping requirements

Type

Package

Version

Driver package

Atlas-A3-hdk-npu-driver_25.2.1_linux-aarch64.run

25.2.1

Firmware package

Atlas-A3-hdk-npu-firmware_7.7.0.9.220.run

7.7.0.9.220

  1. Upload the downloaded NPU driver package and firmware package to any directory on the Lite Server. You can use Xftp or an OBS bucket.
  2. Log in to the Lite Server and go to the directory where the software package is to be uploaded.
  3. Install the NPU driver.

    For details, see "Installing the Driver and Firmware" in Configuring the Resource Software Environment for NPU-based Lite Servers.

  4. Create a driver firmware consistency configuration directory:
    mkdir -p /opt/huawei/firmware_check
  5. Create a driver firmware consistency configuration script:
    cd /opt/huawei/firmware_check
    touch firmware_check.sh
  6. Move the firmware package to the driver firmware consistency directory and modify the software package name:
    mv Atlas-A3-hdk-npu-firmware_7.7.0.9.220.run /opt/huawei/firmware_check/Ascend-hdk-npu-firmware_7.7.0.9.220.run

    Note: If the firmware version is different from the example or the version is changed later, modify the software package version number in the preceding commands accordingly.

  7. Open the firmware_check.sh script and add the content below.
    Open the script file.
    vim firmware_check.sh
    Add the following content:
    #!/bin/bash
    HOME_DIR="/opt/huawei/firmware_check"
    MAX_LOG_LINES=500
    LOG_FILE="${HOME_DIR}/firmware_check.log"
    ASCEND_INSTALL_INFO="/etc/ascend_install.info"
    ASCEND_PATH="/usr/local/Ascend"
    MIN_DURATION=3600
    # ***If the driver version is changed during image creation, update the content below.***
    PRESET_SOFTWARE_VERSION="25.2.1"
    PRESET_FIRMWARE_VERSION="7.7.0.9.220"
    function init_log() {
        if [ ! -f ${LOG_FILE} ]; then
            cat /dev/null > $LOG_FILE
            return
        fi
        line_count=$(wc -l < "${LOG_FILE}")
        if [ $line_count -gt $MAX_LOG_LINES ]; then
            tail -n "${MAX_LOG_LINES}" "${LOG_FILE}" > "${LOG_FILE}.tmp"
            mv "${LOG_FILE}.tmp" "${LOG_FILE}"
        fi
    }
    function log_info() {
        echo -e "$(date +%Y-%m-%d" "%H:%M:%S):[Info]$@" >> $LOG_FILE && echo -e "\033[32m[INFO]\033[0m: $@" > /dev/tty
    }
    function log_warning() {
        echo -e "$(date +%Y-%m-%d" "%H:%M:%S):[Warnning]$@" >> $LOG_FILE && echo -e "\033[33m[WARN]\033[0m: $@" > /dev/tty
    }
    function log_error() {
        echo -e "$(date +%Y-%m-%d" "%H:%M:%S):[Error]$@" >> $LOG_FILE && echo -e "\033[31m[ERROR]\033[0m: $@" > /dev/tty
    }
    function get_param_from_config() {
        file=$1
        wanted=$2
        if [ ! -e $file ]; then
            log_error "File ${file} does not exist."
            return 1
        fi
        while IFS="=" read -r key val; do
            key=$(echo "$key" | tr -d '[:space:]')
            if [[ "$key" == "$wanted" ]]; then
                echo $val
                return 0
            fi
        done < "$file"
        return 1
    }
    function get_object_version() {
        install_info=$1
        install_path_key=$2
        default_install_path=$3
        object_type=$4
        install_path=$(get_param_from_config "${ASCEND_INSTALL_INFO}" "${install_path_key}")
        if [ $? -ne 0 ]; then
            log_warning "Failed to get value of ${install_path_key} from ${install_info}, use ${default_install_path}."
            install_path=${default_install_path}
        fi
        version_info_file="${install_path}/${object_type}/version.info"
        version=$(get_param_from_config "${version_info_file}" "Version")
        if [ $? -ne 0 ]; then
            log_error "Failed to get Version value from ${version_info_file}."
            return 1
        fi
        echo $version
        return 0
    }
    function get_versions_with_tool() {
        output=$(npu-smi info -t board -i ${1})
        if [ $? -ne 0 ]; then
            log_error "Run command 'npu-smi info -t board -i ${i}' error:\n$output"
            return 1
        fi
        software_version=$(echo "$output" | awk -F':' '/Software Version/ {print $2}' | tr -d '[:space:]')
        firmware_version=$(echo "$output" | awk -F':' '/Firmware Version/ {print $2}' | tr -d '[:space:]')
        if [ -n "${software_version}" ] && [ -n "${firmware_version}" ]; then
            echo "${software_version}|${firmware_version}"
            return 0
        fi
        return 1
    }
    function check_versoins() {
        software_version=$1
        firmware_version=$2
        if [ "${software_version}" != "${PRESET_SOFTWARE_VERSION}" ]; then
            log_warning "The current software version is not preset version '${PRESET_SOFTWARE_VERSION}', does not need to upgrade the firmware."
            echo "false"
            return
        fi
        if [ -n "${firmware_version}" ] && [ "${firmware_version}" == "${PRESET_FIRMWARE_VERSION}" ]; then
            log_info "The current firmware version match preset version '${PRESET_FIRMWARE_VERSION}', does not need to be upgraded."
            echo "false"
            return
        fi
        log_info "The current firmware '${firmware_version}' does not match preset version '${PRESET_FIRMWARE_VERSION}', need to be upgraded."
        echo "true"
        return
    }
    function upgrade_firmware() {
        firmware_version=$1
        firmware_package="${HOME_DIR}/Ascend-hdk-npu-firmware_${firmware_version}.run"
        if [ ! -e ${firmware_package} ]; then
            log_error "Firmware package ${firmware_package} does not exist."
            return 1
        fi
        bash ${firmware_package} --full --quiet >> $LOG_FILE 2>&1
        return $?
    }
    function upgrade_mcu() {
        bash ${HOME_DIR}/upgrade_mcu.sh ${HOME_DIR}/Ascend-hdk-mcu_${PRESET_MCU_VERSION}.hpm
    }
    function permit_check() {
        if [ ! -e ${TIME_TAG} ]; then
            log_info "First check of this machine, permit."
            return 0
        fi
        this_time=$(date +%s)
        if [ $? -ne 0 ]; then
            log_error "Failed to get current time, error code $?."
            return 1
        fi
        last_time=$(stat -c %Y ${TIME_TAG})
        if [ $? -ne 0 ]; then
            log_error "Failed to get the timestamp of last check, error code $?."
            return 1
        fi
        duration=$(expr $this_time - $last_time)
        if [ $? -ne 0 ]; then
            log_error "Failed to caculate duration, forbidden."
            return 1
        fi
        if [ $duration -lt ${MIN_DURATION} ]; then
            log_info "Duration less than ${MIN_DURATION} sec, forbidden."
            return 1
        fi
        log_info "Duration from this time to last time is ${duration} sec."
        return 0
    }
    function main() {
        log_info "Start to check the firmware."
        init_log
        permit_check
        if [ $? -ne 0 ]; then
            log_error "This check is too frequent, try later."
            return
        fi
        firmware_reboot="false"
        need_upgrade="false"
        for i in $npu_ids; do
            result=$(get_versions_with_tool ${i})
            if [ $? -eq 0 ]; then
                software_version=$(echo $result | awk -F'|' '{print $1}')
                firmware_version=$(echo $result | awk -F'|' '{print $2}')
                log_info "npu ${i} software version and firmware version from npu-smi are ${software_version} and ${firmware_version}."
            else
                log_warning "npu ${i} failed to get versions using npu-smi, try to read the software version from config file."
                if [ -z "${config_software_version}" ]; then
                    log_error "failed to get software version from config file."
                    return
                fi
                software_version=${config_software_version}
                log_info "npu ${i} software version from config is ${software_version}."
            fi
            need_upgrade=$(check_versoins "${software_version}" "${firmware_version}")
            if [ "${need_upgrade}" == "true" ]; then
                log_info "npu ${i} firmware need upgrade."
                break
            fi
        done
        if [ "${need_upgrade}" == "true" ]; then
            log_info "Start upgrading the firmware."
            upgrade_firmware ${PRESET_FIRMWARE_VERSION}
            if [ $? -ne 0 ]; then
                log_error "Failed to upgrade firmware."
            else
                log_info "The firmware upgrade to '${PRESET_FIRMWARE_VERSION}' succeeded, reboot now."
                firmware_reboot="true"
            fi
        fi
        if [[ "${firmware_reboot}" == "true" ]]; then
            log_info "Start to reboot."
            touch ${TIME_TAG}
            reboot
        fi
        log_info "The firmware check completed."
    }
    metadata=$(/usr/bin/curl http://169.254.169.254/openstack/latest/meta_data.json)
    if [ $? -ne 0 ]; then
        log_error "Failed to get metadata, error code $?."
        exit
    fi
    if [ -z "${metadata}" ]; then
        log_error "Metadata is empty, abort."
        exit
    fi
    uuid=$(echo ${metadata} | grep -o '"uuid": "[^"]*' | sed 's/"uuid": "//')
    TIME_TAG="${HOME_DIR}/timetag${uuid}"
    config_software_version=$(get_object_version "${ASCEND_INSTALL_INFO}" "Driver_Install_Path_Param" "${ASCEND_PATH}" "driver")
    if [ $? -ne 0 ]; then
        log_error "npu ${i} failed to get software version from config file."
    fi
    npu_ids=$(npu-smi info -l | awk '/NPU ID/{print $NF}')
    npu_count=$(echo "$npu_ids" | wc -l)
    main
    echo "Check $LOG_FILE for details."

    Note: If the driver and firmware versions are different from the example or the version is changed later, modify the version settings in the script configuration.

  8. Add the execute permission for the firmware package and configuration script:
    cd /opt/huawei/firmware_check
    chmod 700 firmware_check.sh
    chmod 700 Ascend-hdk-npu-firmware_7.7.0.9.220.run
  9. Add the automatic startup service for driver firmware consistency:
    cd /etc/systemd/system
    vim firmware_check.service

    Add the following content:

    [Unit]
    Description=check and upgrade firmware
    After=config-hash.service
    Requires=config-hash.service
    [Service]
    Type=oneshot
    ExecStart=/opt/huawei/firmware_check/firmware_check.sh
    RemainAfterExit=yes
    User=root
    [Install]
    WantedBy=cloud-init.target

    Configure the service to automatically start upon system startup.

    systemctl daemon-reload
    systemctl enable firmware_check.service

Configuring UDP Port Hash

  1. Create a UDP port hash configuration directory:
    mkdir -p /opt/huawei/port_config
  2. Add the hash configuration script:
    touch port_config.json
    touch uplink_hash_config.py
  3. Write the hash configuration script:
    vim uplink_hash_config.py

    Add the following content:

    # -*- coding: UTF-8 -*-
    import base64
    import logging
    import time
    import json
    import shutil
    import os
    import requests
    try:
        import commands
    except ImportError:
        import subprocess as commands
    A2_NPU_COUNT = 8
    A3_NPU_COUNT = 16
    METADATA_URL = 'http://169.254.169.254/openstack/latest/meta_data.json'
    A3_32K_CLUSTER = "cluster4"
    A3_64K_CLUSTER = "cluster5"
    A3_128K_CLUSTER = "cluster6"
    A3_9866_CLUSTER = "cluster8"
    A3_9866_REGION_ID = "cn-guian02"
    log = logging.getLogger(__name__)
    class UplinkHashConfig:
        def __init__(self, config_dir="/opt/huawei/port_config/",
                     log_file="/opt/huawei/port_config/uplink_hash_config.log"):
            self.DIR = config_dir
            self.setup_log(log_file)
        def setup_log(self, log_file):
            handler = logging.FileHandler(log_file)
            formatter = logging.Formatter('%(asctime)s - %(filename)s[%(levelname)s]: %(message)s')
            handler.setFormatter(formatter)
            log.addHandler(handler)
            log.setLevel(logging.INFO)
        def read_url(self, url, timeout=None, retries=0, sec_between=1, check_status=True):
            manual_tries = 1
            if retries:
                manual_tries = max(int(retries) + 1, 1)
            if sec_between is None:
                sec_between = -1
            for i in range(0, manual_tries):
                try:
                    r = requests.get(url, timeout=timeout)
                    if check_status:
                        r.raise_for_status()
                    return r
                except requests.exceptions.RequestException as e:
                    if i + 1 < manual_tries and sec_between > 0:
                        log.warning("Get metadata failed, wait %s seconds to try again", sec_between)
                        time.sleep(sec_between)
            log.error("Get metadata failed, please check network and security group.")
            return None
        def get_meta_json(self, timeout=5, retries=5):
            try:
                resp = self.read_url(url=METADATA_URL,
                    timeout=timeout,
                    retries=retries)
                metadata = resp.json()
            except Exception as e:
                log.error(
                    "Get metadata failed. Error: %s", e)
                return False, None
            return True, metadata
        def get_npu_count(self, meta_json):
            hyperinstance_type = meta_json.get("meta", {}).get("_sys_hyperinstance_type")
            if hyperinstance_type:
                log.info("Node type is hyperinstance.")
                return A3_NPU_COUNT
            return A2_NPU_COUNT
        def get_config_url(self, region):
            if region == "cn-north-7":
                url = "https://cnnorth7-modelarts-sdk.obs.cn-north-7.ulanqab.huawei.com"
            elif region == "cn-north-9":
                url = "https://cnnorth9-modelarts-sdk.obs.cn-north-9.myhuaweicloud.com"
            elif region == "cn-south-1":
                url = "https://cnsouth1-modelarts-sdk.obs.cn-south-1.myhuaweicloud.com"
            elif region == "cn-east-3":
                url = "https://cneast3-modelarts-sdk.obs.cn-east-3.myhuaweicloud.com"
            elif region == "ap-southeast-1":
                url = "https://ap-southeast1-modelarts-sdk.obs.ap-southeast-1.myhuaweicloud.com"
            elif region == "cn-north-11":
                url = "https://cnnorth11-modelarts-sdk.obs.cn-north-11.myhuaweicloud.com"
            elif region == "cn-southwest-2":
                url = "https://cn-southwest-2-modelarts-sdk.obs.cn-southwest-2.myhuaweicloud.com"
            elif region == "cn-east-4":
                url = "https://cneast4-modelarts-sdk.obs.dualstack.cn-east-4.myhuaweicloud.com"
            elif region == "la-south-2":
                url = "https://la-south2-modelarts-sdk.obs.la-south-2.myhuaweicloud.com"
            elif region == "ap-southeast-3":
                url = "https://ap-southeast3-modelarts-sdk.obs.ap-southeast-3.myhuaweicloud.com"
            elif region == "me-east-1":
                url = "https://me-east-1-modelarts-sdk.obs.me-east-1.myhuaweicloud.com"
            else:
                url = "https://{0}-modelarts-sdk.obs.{1}.myhuaweicloud.com".format(region.replace("-", ""), region)
            return url + "/devserver/port_config.json"
        def download_file(self, url, destination):
            log.info("Downloaded file from %s to %s", url, destination)
            try:
                response = requests.get(url, timeout=10)
                response.raise_for_status() 
                with open(destination, "wb") as f:
                    f.write(response.content)
                return True
            except requests.exceptions.RequestException as e:
                log.error("Failed to download file from %s: %s", url, str(e))
                return False
        def get_port_config(self, url):
            config_file = os.path.join(self.DIR, "port_config.json")
            backup_file = os.path.join(self.DIR, "port_config.json.bak")
            if url is None:
                log.warning("Url is none, use local config file.")
                try:
                    with open(config_file, "r") as f:
                        hash_config = json.load(f)
                    log.info("Loaded config from %s", config_file)
                    return hash_config
                except FileNotFoundError:
                    log.error("Config file %s not found.", config_file)
                    return None
                except json.JSONDecodeError:
                    log.error("Failed to decode JSON from %s", config_file)
                    return None
            try:
                shutil.copy2(config_file, backup_file)
                log.info("Backed up %s to %s", config_file, backup_file)
            except Exception as e:
                log.error("Failed to backup %s: %s", config_file, str(e))
                return None
            if self.download_file(url, config_file):
                try:
                    with open(config_file, "r") as f:
                        hash_config = json.load(f)
                    log.info("Loaded new config from %s", config_file)
                    os.remove(backup_file)
                    return hash_config
                except json.JSONDecodeError:
                    log.error("Failed to decode json from downloaded file %s, restored backup file.", config_file)
                    shutil.copy2(backup_file, config_file)
                    os.remove(backup_file)
                    with open(config_file, "r") as f:
                        hash_config = json.load(f)
                    return hash_config
            else:
                log.warning("Download failed, restored local backup config file.")
                shutil.copy2(backup_file, config_file)
                os.remove(backup_file)
                with open(config_file, "r") as f:
                    hash_config = json.load(f)
                return hash_config
        def wait_until_npu_ready(self, npu_count):
            for _ in range(0, 10):
                count = 0
                for i in range(0, npu_count):
                    cmd = "hccn_tool -i {} -lldp -g | grep Ifname | awk -F ': ' '{{print $2}}'".format(i)
                    (status, output) = commands.getstatusoutput(cmd)
                    if not output:
                        log.error("result: get ifname failed, try again after 30s, id:%s", i)
                        time.sleep(30)
                        break
                    count += 1
                if count == npu_count:
                    log.info("result: get all ifname success.")
                    return
            log.warning("Failed to get ifname after 10 attempts.")
        def print_current_port(self, npu_count):
            for i in range(0, npu_count):
                cmd = "hccn_tool -i {} -udp -g".format(i)
                (status, output) = commands.getstatusoutput(cmd)
                log.info("port %s: %s", i, output)
        def config_udp_port_auto(self, npu_count):
            log.info("Configuring ports in auto mode.")
            for i in range(0, npu_count):
                cmd = "hccn_tool -i {} -udp -s auto".format(i)
                (status, _) = commands.getstatusoutput(cmd)
        def config_udp_port(self, port_config, flavor, npu_count, region_id):
            if port_config is None:
                self.config_udp_port_auto(npu_count)
                return
            cmd = "echo -n {} | sha256sum | awk '{{print $1}}'".format(flavor)
            (status, sha256_flavor) = commands.getstatusoutput(cmd)
            log.info("Sha256 flavor is %s", sha256_flavor)
            cur_cluster = self.get_cluster(port_config, sha256_flavor, region_id)
            if not cur_cluster:
                if A3_NPU_COUNT == npu_count:
                    log.warning("Get cluster from port_config failed, flavor is %s, set default value %s", flavor, A3_32K_CLUSTER)
                    cur_cluster = A3_32K_CLUSTER
                else:
                    log.error("Get cluster from port_config failed, flavor is %s", flavor)
                    self.config_udp_port_auto(npu_count)
                    return
            log.info("Cluster is %s", cur_cluster)
            desire_port_map = self.get_desire_port_map(cur_cluster, port_config)
            for i in range(0, npu_count):
                cmd = "hccn_tool -i {} -lldp -g | grep Ifname | awk -F ': ' '{{print $2}}'".format(i)
                (status, ifname_tmp) = commands.getstatusoutput(cmd)
                key = "{}|{}".format(i, ifname_tmp)
                desire_port = desire_port_map.get(key, None)
                if not desire_port:
                    log.error("Get desire port from port_config failed, id %s, ifname %s", i, ifname_tmp)
                    self.config_udp_port_auto(npu_count)
                    return
                cmd = "hccn_tool -i {} -udp -g | awk -F ':' '{{print $2}}' | grep -v auto".format(i)
                (status, current_port) = commands.getstatusoutput(cmd)
                if desire_port != current_port:
                    cmd = "hccn_tool -i {} -udp -s port {}".format(i, desire_port)
                    (status, _) = commands.getstatusoutput(cmd)
        def get_cluster(self, port_config, sha256_flavor, region_id):
            if region_id == A3_9866_REGION_ID:
                return A3_9866_CLUSTER
            return port_config.get("flavors", {}).get(sha256_flavor, None)
        def get_desire_port_map(self, cur_cluster, port_config):
            if cur_cluster == A3_32K_CLUSTER:
                return self.get_a3_desire_port_map(5120, 5183, 32)
            elif cur_cluster == A3_64K_CLUSTER:
                return self.get_a3_desire_port_map(5184, 5327, 36)
            elif cur_cluster == A3_128K_CLUSTER:
                return self.get_a3_desire_port_map(5184, 5471, 36)
            else:
                return port_config.get("clusters", {}).get(cur_cluster, {})
        def get_a3_desire_port_map(self, start, end, max_tor_port):
            udp_port_map = {}
            tor_ge_id = 1
            tor_port = 0
            udp_port = start
            while udp_port <= end:
                udp_port_map["6|400GE{0}/0/{1}:1".format(tor_ge_id, tor_port)] = udp_port
                udp_port_map["14|400GE{0}/0/{1}:2".format(tor_ge_id, tor_port)] = udp_port
                udp_port_map["4|400GE{0}/0/{1}:1".format(tor_ge_id, tor_port + 1)] = udp_port + 1
                udp_port_map["12|400GE{0}/0/{1}:2".format(tor_ge_id, tor_port + 1)] = udp_port + 1
                udp_port_map["2|400GE{0}/0/{1}:1".format(tor_ge_id, tor_port + 2)] = udp_port + 2
                udp_port_map["10|400GE{0}/0/{1}:2".format(tor_ge_id, tor_port + 2)] = udp_port + 2
                udp_port_map["0|400GE{0}/0/{1}:1".format(tor_ge_id, tor_port + 3)] = udp_port + 3
                udp_port_map["8|400GE{0}/0/{1}:2".format(tor_ge_id, tor_port + 3)] = udp_port + 3
                udp_port_map["7|400GE{0}/0/{1}:1".format(tor_ge_id, tor_port)] = udp_port
                udp_port_map["15|400GE{0}/0/{1}:2".format(tor_ge_id, tor_port)] = udp_port
                udp_port_map["5|400GE{0}/0/{1}:1".format(tor_ge_id, tor_port + 1)] = udp_port + 1
                udp_port_map["13|400GE{0}/0/{1}:2".format(tor_ge_id, tor_port + 1)] = udp_port + 1
                udp_port_map["3|400GE{0}/0/{1}:1".format(tor_ge_id, tor_port + 2)] = udp_port + 2
                udp_port_map["11|400GE{0}/0/{1}:2".format(tor_ge_id, tor_port + 2)] = udp_port + 2
                udp_port_map["1|400GE{0}/0/{1}:1".format(tor_ge_id, tor_port + 3)] = udp_port + 3
                udp_port_map["9|400GE{0}/0/{1}:2".format(tor_ge_id, tor_port + 3)] = udp_port + 3
                udp_port += 4
                tor_port += 4
                if tor_port >= max_tor_port:
                    tor_port = 0
                    tor_ge_id += 1
            return udp_port_map
        def run(self):
            ret, meta_json = self.get_meta_json()
            if not ret:
                log.error("Get meta_json failed, metadata json is %s", meta_json)
                return
            log.info("Get meta_json success, metadata json is %s", meta_json)
            region_id = meta_json.get('region_id', None)
            if not region_id:
                log.error("Get region_id from metadata failed.")
                return
            log.info("Region is %s", region_id)
            flavor = meta_json.get('instance_type', None)
            if not flavor:
                log.error("Get instance_type from metadata failed.")
                return
            log.info("Flavor is %s", flavor)
            npu_count = self.get_npu_count(meta_json)
            log.info("Npu count is %s", npu_count)
            url = self.get_config_url(region_id)
            port_config = self.get_port_config(url)
            self.wait_until_npu_ready(npu_count)
            log.info("Before config uplink udp hash, Port is:")
            self.print_current_port(npu_count)
            log.info("=====================================================")
            self.config_udp_port(port_config, flavor, npu_count, region_id)
            log.info("Config uplink udp hash done. Port is:")
            self.print_current_port(npu_count)
    if __name__ == "__main__":
        configurator = UplinkHashConfig()
        configurator.run()
  4. Add the script execution permission:
    cd /opt/huawei/port_config
    chmod +x uplink_hash_config.py
  5. Obtain the latest configuration file of the current site:
    python uplink_hash_config.py
  6. Add the automatic startup service for hash configuration:
    cd /etc/systemd/system
    vim config-hash.service

    Add the following content:

    [Unit]
    Description=Run uplink hash config
    After=bms-network-config.service
    Requires=bms-network-config.service
    [Service]
    Type=oneshot
    ExecStart=python3 /opt/huawei/port_config/uplink_hash_config.py
    RemainAfterExit=yes
    User=root
    [Install]
    WantedBy=cloud-init.target

    Configure the service to automatically start upon system startup.

    cd /etc/systemd/system
    systemctl daemon-reload
    systemctl enable config-hash.service

Verification

To verify whether driver firmware consistency and port hash are configured, restart the server. For details, see Restarting a Lite Server.

Note: The server restarts automatically for the firmware upgrade to take effect.

  1. Verify the driver firmware version.

    View the NPU driver and firmware versions:

    ascend-dmi -c

    If the firmware version is the same as configured, the settings are successful.

  2. Verify the UDP port hash configuration.

    View the uplink port on the parameter plane network:

    npu_ids=$(npu-smi info -l | awk '/NPU ID/{print $NF}')
    for i in $npu_ids; do hccn_tool -i $i -udp -g;done

    In the command output, if udp_port is not Unknown and status is custom, the configuration is successful.

Clearing and Saving the Image

  1. Clear the logs about driver firmware consistency and UDP port hash.
    rm -rf /opt/huawei/port_config/uplink_hash_config.log
    rm -rf /opt/huawei/firmware_check/firmware_check.log
    rm -rf /opt/huawei/firmware_check/time*
  2. Clear the traces and create a new image. For details, see Creating the OS of a Lite Server.