Help Center> Intelligent EdgeFabric> FAQs> Edge Node FAQs> What Do I Do If Edge Node Management Fails?
Updated on 2024-01-19 GMT+08:00

What Do I Do If Edge Node Management Fails?

Symptom

The edge node cannot be managed on IEF.

Fault Locating

There are many causes for edge node management failures. The most common cause is that the edge node does not meet management requirements of IEF or the network is inaccessible. Follow the steps in the following figure to locate the cause of the edge node management failure.

Figure 1 Fault locating

Edge Node Does Not Meet Management Requirements

Edge Agent can be installed only on edge nodes that meet requirements listed in Table 2.

Table 2 Edge node requirements

Item

Specifications

OS

The language of the operating system must be English.

  • x86_64 architecture

    Ubuntu LTS (Xenial Xerus), Ubuntu LTS (Bionic Beaver), CentOS, EulerOS, RHEL, Kylin, NewStart CGS Linux, NeoKylin, openEuler, Unity Operating System (UOS), Oracle Linux (OL), Huawei Cloud Euler (HCE)

  • Armv7i (Arm32) architecture

    Raspbian GNU/Linux (stretch)

  • AArch64 (Arm64) architecture

    Ubuntu LTS (Bionic Beaver), CentOS, EulerOS, openEuler, Unity Operating System (UOS), Oracle Linux (OL), Huawei Cloud Euler (HCE)

Memory

More than 256 MB of memory is recommended as 128 MB of memory is required to run the edge software.

CPU

≥ 1 core

Hard disk

≥ 1 GB

GPU (optional)

The GPU models on the same edge node must be the same.

NOTE:

Currently, NVIDIA Tesla GPUs such as P4, P40, and T4 are supported.

If an edge node is equipped with GPUs, you can choose not to enable its GPUs when registering it on IEF.

If you choose to enable GPUs of an edge node, the GPU driver has to be installed on the edge node before you can manage it on IEF.

Currently, only x86-based GPU nodes can be managed by IEF.

NPU (optional)

Huawei Ascend AI processors

NOTE:

Currently, edge nodes integrated with Huawei Ascend Processors are supported, such as Atlas 300 inference cards, and Atlas 800 inference servers.

If you choose to enable NPUs of an edge node, ensure that the NPU driver has been installed on it.The NPU driver version must be 22.0.4 or later. You can go to the driver path, for example, /usr/local/Ascend/driver, and run the cat version.info command to view your driver version. If the driver is not installed, contact the device manufacturer for assistance.

Container engine

The Docker version must be later than 17.06. If Docker 1.23 or later is used, set the docker cgroupfs version to 1. Docker HTTP API v2 is not supported.

(However, Docker 18.09.0 is not recommended as it has a serious bug. For details, see https://github.com/docker/for-linux/issues/543. If this version has been installed, upgrade it at the earliest possible opportunity. )

NOTICE:

After Docker is installed, configure the Docker process to start at host startup. This configuration prevents system exceptions caused by Docker startup failures after the host is restarted.

Docker Cgroup Driver must be set to cgroupfs. For details, see How Do I Set Docker Cgroup Driver After Installing Docker on an Edge Node?.

Glibc

The Glibc version must be later than 2.17.

Port

Edge nodes require port 8883, which is the listening port of the built-in MQTT broker on edge nodes. Ensure that this port works properly.

Time synchronization

The time on an edge node must be consistent with the UTC time. Otherwise, the monitoring data and logs of the edge node may be inaccurate. You can select an NTP server for time synchronization. For details, see How Do I Synchronize Time with the NTP Server?

OS Is Not Supported

Check whether your OS is supported by IEF by referring to Table 2. Do not use the Linux OS of the Chinese edition.

OS Kernel Version Is Too Early

Check whether the OS and kernel version of your edge node meet the requirements described in Table 2.

You can run the following commands to check whether your OS kernel version is too early:

sh /opt/edge-installer/conf/script/parse_user_config.sh node_id

In the preceding command, node_id indicates the edge node ID.

If an error is reported, the OS kernel version is too early. Upgrade the kernel version or install an OS listed in Table 2, and then manage the edge node again.

OS Information of the Edge Node Fails to Be Obtained

View the IEF software installation logs. If the os field in the last line of the following output is empty, the OS information fails to be obtained.

2020-01-11 17:00:46.341 +08:00 DEBUG :0 init logger...
2020-01-11 17:00:46.341 +08:00 INFO config/config.go:45 New file source added for configuration: /opt/edge-installer/conf/config.yaml
2020-01-11 17:00:46.341 +08:00 INFO config/config.go:45 New file source added for configuration: /opt/edge-installer/conf/logging.yaml
2020-01-11 17:00:46.351 +08:00 INFO pkg/installer.go:24 start to install
2020-01-11 17:00:46.386 +08:00 INFO placementclient/placementclient.go:61 http_proxy:ProxyNotSet, https_proxy:ProxyNotSet
2020-01-11 17:00:46.437 +08:00 INFO httpclient/httpsclient.go:182 https_proxy:
2020-01-11 17:00:46.479 +08:00 INFO util/util.go:446 system cert file[/opt/IEF/Cert/system/sys_private_cert_crypto.crt] and system key file[/opt/IEF/Cert
/system/sys_private_cert_crypto.key] have been inited
2020-01-11 17:00:46.479 +08:00 INFO pkg/installer.go:46 ------------------install---------------
2020-01-11 17:00:46.479 +08:00 INFO deploy/bootstrap.go:48 install precheck success.
2020-01-11 17:00:46.479 +08:00 INFO deploy/bootstrap.go:54 install preprocess start
2020-01-11 17:00:46.479 +08:00 INFO deploy/deploy.go:39 install preprocess start
2020-01-11 17:00:46.501 +08:00 INFO util/util.go:192 get arch success
2020-01-11 17:00:46.502 +08:00 INFO util/util.go:216 os type is:"euleros"
2020-01-11 17:00:46.502 +08:00 INFO util/util.go:432 installer version [1.0.6]
2020-01-11 17:00:46.516 +08:00 INFO placementclient/placementclient.go:113 body : {"arch":"x86_64","installer_version":"1.0.6","os":"euleros"}

NPU Driver Is Not Installed on the Edge Node with an AI Accelerator Card

If you try to register an edge node of the AI accelerator card type, make sure that the edge node supports NPUs and has an NPU driver installed.

Run the following command on your edge node:

ls /dev/davinci_manager /dev/hisi_hdc /dev/davinci*

If the file does not exist, no NPU driver is installed. Install an NPU driver.

GPU Driver Is Not Installed on the Edge Node with a GPU

If you choose to enable GPUs of an edge node, the GPU driver has to be installed on the edge node before you can manage it on IEF. Currently, IEF supports only NVIDIA Tesla P4, P40, and T4 GPUs and the GPU drivers that match CUDA Toolkit 8.0 to 11.0.

  1. Install a GPU driver.

    Currently, IEF supports only NVIDIA Tesla P4, P40, and T4 GPUs and the GPU drivers that match CUDA Toolkit 8.0 to 11.0.

    1. Download the GPU driver. The recommended driver link is as follows:

      https://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/tesla/440.33.01/NVIDIA-Linux-x86_64-440.33.01.run&lang=us&type=Tesla

    2. Run the following command to install the GPU driver:

      bash NVIDIA-Linux-x86_64-440.33.01.run

    3. Run the following command to check the GPU driver installation status:

      nvidia-smi

  2. Copy GPU driver files to specific directories.

    1. Log in to the edge node as user root.
    2. Run the following command:

      nvidia-modprobe -c0 -u

    3. Create directories.

      mkdir -p /var/IEF/nvidia/drivers /var/IEF/nvidia/bin /var/IEF/nvidia/lib64

    4. Copy GPU driver files to the directories.
      • For CentOS, run the following commands in sequence to copy the driver files:

        cp /lib/modules/{Kernel version of the current environment}/kernel/drivers/video/nvi* /var/IEF/nvidia/drivers/

        cp /usr/bin/nvidia-* /var/IEF/nvidia/bin/

        cp -rd /usr/lib64/libcuda* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/libEG* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/libGL* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/libnv* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/libOpen* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/libvdpau_nvidia* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/vdpau /var/IEF/nvidia/lib64/

      • For Ubuntu, run the following commands in sequence to copy the driver files:

        cp /lib/modules/{Kernel version of the current environment}/kernel/drivers/video/nvi* /var/IEF/nvidia/drivers/

        cp /usr/bin/nvidia-* /var/IEF/nvidia/bin/

        cp -rd /usr/lib/x86_64-linux-gnu/libcuda* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/libEG* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/libGL* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/libnv* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/libOpen* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/libvdpau_nvidia* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/vdpau /var/IEF/nvidia/lib64/

      You can run the uname -r command to view the kernel version of the current environment, for example, 3.10.0-514.e17.x86_64. Replace the kernel version with the actual value.

      # uname -r
      3.10.0-514.e17.x86_64
    5. Run the following command to change the directory permissions:

      chmod -R 755 /var/IEF

Disk Space Is Full

The IEF software cannot be installed on the edge node if the disk space is full. Run the following command to check the disk space:

df -h

lsblk

Ensure that the disk usage of the following directories is not nearly 100%. For details about the disk space requirements, see Table 2.

  • /opt/IEF
  • /opt/edge-installer
  • /opt/IEFpack
  • /var/IEF

Container Engine Is Not Installed or Started

Run the following command to check whether a container engine is started:

systemctl status docker

  • If the container engine information is not displayed, the container engine is not installed. Install a container engine based on the requirements listed in Table 2.
  • If the container engine is not started, run the following command to start it:

    systemctl restart docker

    Check the container engine status again.

    • If the container engine is started properly (in the active state), manage the edge node again.
    • If the container engine cannot be started, restore or reinstall it.

Multiple docker0 Bridge Addresses Exist on the Edge Node

Two docker0 bridge addresses are generated after a container with container engine GUI is used. As a result, the docker0 bridge registration fails when IEF is managing the edge node, causing a management failure. To rectify the fault, delete the redundant docker0 bridge address and manage the edge node again.

You can run the following command to query the docker0 bridge address:

ip addr show | grep docker0

If multiple IP addresses are displayed, multiple docker0 bridges exist. Retain the IP address starting with 172 and delete other redundant docker0 bridge addresses.

Port 8883 Is Occupied

Run the following command to check whether port 8883 is occupied:

netstat -npl | grep 8883

The IEF core component (edgecore) depends on port 8883. If port 8883 is occupied, edgecore fails to be installed.

After the edge node is properly managed, edgecore listens at port 8883. Therefore, ensure that port 8883 is not occupied.

Port 8883 Is Disabled by the Firewall

Check the firewall status on the edge node.

systemctl status firewalld

firewall-cmd --state

In the command output, not running indicates that the firewall is disabled and running indicates that the firewall is enabled.

If the firewall is enabled, enable port 8883 or disable the firewall.

  • To enable port 8883, run the following commands:

    firewall-cmd --add-port=8883/tcp --permanent

    systemctl restart firewalld

  • To disable the firewall, run the following commands:

    systemctl disable firewalld

    systemctl stop firewalld

Edge Node Cannot Connect to IEF

Run the following command to check whether the edge node can connect to IEF:

curl -i -k -v https://ief2-edgeaccess.cn-north-4.myhuaweicloud.com:443/

In the preceding command, ief2-edgeaccess.cn-north-4.myhuaweicloud.com indicates the edgeaccess domain name of the service instance. The domain name varies according to the region. For details, see Domain Name of the Professional Service Instance or Domain Name of the Platinum Service instance.

If no command output is displayed, the edge node is disconnected from IEF. Check the network and ensure that the edge node can connect to IEF.

Edge Node Fails to Resolve Domain Names

Ensure that your edge node can resolve the following domain names:

  • Domain name of the region where IEF is located, for example, ief2-placement.cn-north-4.myhuaweicloud.com or ief-placement.cn-east-3.myhuaweicloud.com. You can run the cat /opt/IEF/Cert/user_config command to query the domain name of your region.
  • edgeaccess domain name of the service instance, for example, ief2-edgeaccess.cn-north-4.myhuaweicloud.com. This domain name varies according to the region and service instance. For details, see Domain Name of the Professional Service Instance or Domain Name of the Platinum Service instance.

Run the ping command to check whether the domain name can be resolved. For example:

ping ief2-edgeaccess.cn-north-4.myhuaweicloud.com
ping ief2-placement.cn-north-4.myhuaweicloud.com

If the domain name cannot be resolved, reconfigure a DNS server. The DNS server with IP address 114.114.114.114 is recommended.

A Certificate Is Used on Multiple Edge Nodes

The same certificate is loaded on multiple edge nodes, and one of these nodes is in the Running state.

The edge nodes registered on the IEF console must correspond to the actual edge nodes. Do not create only one edge node on the IEF console and load the installation package and certificate downloaded from the IEF console to multiple edge nodes.

Run the following command to check whether the certificate is repeatedly used:

cat /var/IEF/sys/log/edge_core.log | grep websocket

If a message indicating that node node_id has been occupied is displayed, the certificate is repeatedly used, as shown below.

An Edge Node Is Managed for Multiple Times

  • The uninstall operation is not correctly performed before the edge node is managed again. To be specific, the edge node is deleted only on the IEF console, but the IEF software is not uninstalled from the edge node.

    Run the following commands to check whether the IEF components on the edge node are running:

    systemctl status edgecore

    systemctl status edgemonitor

    systemctl status edgelogger

    If the edge node fails to be managed but the preceding components are still running, the components are not correctly uninstalled from the edge node. Run the following command to uninstall the components:

    cd /opt/edge-installer; sudo ./installer -op=uninstall

    The following misoperation may occur:

    To manage the edge node again, rename the original /opt directory of the node /opt_old, create the /opt directory, and manage the node based on the guide provided by IEF. When the node management fails, an uninstall operation is performed. The system prompts that the uninstallation is successful but the preceding components are still running. This is because the IEF components you uninstalled are not those installed in the /opt_old directory. In this case, restore the /opt directory, uninstall the IEF components, and then manage the edge node again. However, do not perform this operation to manage an edge node again.

  • After the uninstallation is complete, delete the managed components, generated logs, and downloaded configuration files.
    Figure 2 Files in the /opt directory

    To delete the managed components shown in the red box, run the rm -rf /opt/edge-installer, rm -rf /opt/IEF, rm -rf /opt/IEF_firmware, rm -rf /opt/IEFpack, and rm -rf /opt/material commands.

    To delete logs, run the rm -rf /var/IEF command.

    To delete configuration files, run the rm -rf edge-installer_1.0.10_x86_64.tar.gz ief-node.tar.gz command.

  • IEF components on the edge node are not completely uninstalled.

    If the uninstallation is complete but the management still fails, restart the edge node and try again.

Command for Managing Edge Nodes Is Not Run in the Specified Directory

The installation command is as follows:

cd /opt/edge-installer; sudo ./installer -op=install

The cd /opt/edge-installer command must be run to ensure that the installation command is run in the edge-installer directory.

Edge Node FAQs FAQs

more