Halaman ini belum tersedia dalam bahasa lokal Anda. Kami berusaha keras untuk menambahkan lebih banyak versi bahasa. Terima kasih atas dukungan Anda.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Intelligent EdgeFabric/ FAQs/ Edge Node FAQs/ What Do I Do If an Edge Node Is Faulty?

What Do I Do If an Edge Node Is Faulty?

Updated on 2022-12-09 GMT+08:00

Symptom

An edge node is in the Faulty state, and the fault cause is displayed when the cursor is hovered over .

Figure 1 Node fault

Fault Locating

Locate the cause of the edge node fault as follows:

Table 1 Fault locating

Possible Cause

Solution

The edge node is shut down.

Edge Node Is Shut Down

A container engine fault occurs, for example, the container engine is not started or the container engine service is abnormal.

Local Container Engine of the Edge Node Is Abnormal

The node disk space is insufficient.

The network connection of the edge node is abnormal.

Network Connection of the Edge Node Is Abnormal

The GPU driver is abnormal.

GPU Driver Is Abnormal

The NPU plug-in is abnormal.

NPU Plug-in Is Abnormal

The edgecore component installed on the edge node is abnormal.

edgecore Is Abnormal

The edge node enters the recovery mode after being forcibly powered off and then powered on.

System Enters the Recovery Mode

Edge Node Is Shut Down

When the edge node is shut down, it cannot report its status to IEF. In this case, IEF determines that the edge node is faulty. Therefore, keep the edge node running.

CAUTION:

You are billed for the number of edge applications not the number of edge nodes. If an edge node is faulty, the edge applications deployed on this node still incur charges even if they are in the abnormal state. Therefore, if you do not need to use services temporarily, delete the corresponding applications from IEF instead of stopping the edge node.

Local Container Engine of the Edge Node Is Abnormal

The startup and running of the IEF core component (edgecore) depend on the container engine. Therefore, if the container engine is abnormal, the edgecore component cannot be started.

Solution

  1. Run docker version to check whether the container engine is normal. If the container engine is abnormal, run systemctl restart docker to restart it.
  2. Run docker ps to check whether the container engine is available. If the container engine is not available, restart or reinstall it.
CAUTION:

Do not forcibly power off the edge node. Otherwise, data files on the edge node may be lost or damaged, which can cause node faults.

Container Disk Space of the Edge Node Is Insufficient

Solution

  1. Log in to the edge node. Run the following command to check the usage of the disk mounted to the container running on the edge node:

    df -h

  2. Delete unnecessary files to release the disk space.

    rm File name

/opt/IEF Disk Space of the Edge Node Is Insufficient

Solution

  1. Log in to the edge node. Run the following command to check the usage of the disk space allocated to /opt/IEF:

    df -h

  2. Delete unnecessary files to release the disk space.

    rm File name

/var/IEF/sys/log Disk Space of the Edge Node Is Insufficient

Solution

  1. Log in to the edge node. Run the following command to check the usage of the disk space allocated to /var/IEF/sys/log:

    df -h

  2. Delete unnecessary files to release the disk space.

    rm File name

Network Connection of the Edge Node Is Abnormal

Identification Method

  1. Run the following command on the edge node to obtain the IP address for accessing IEF:

    cat /opt/IEF/Edge-core/conf/edge.yaml | grep ws-url

    Information similar to the following is displayed:

    ws-url: wss://ief2-edgeaccess.cn-north-4.myhuaweicloud.com:443/

    In the preceding command output,

    ief2-edgeaccess.cn-north-4.myhuaweicloud.com indicates the required address. The address varies according to the region. The address format of a platinum service instance is 1fc0704e-229c-4210-9802-75f66aeffe3d.cn-north-4.huaweiief.com. You can also view the address, that is, Access Domain, on the IEF console.

    Figure 2 Viewing the cloud access domain name
  2. Run the curl command to check whether the edge node can connect to IEF.

    curl -i -v -k https://ief2-edgeaccess.cn-north-4.myhuaweicloud.com

    • If no command output is displayed, the network between the edge node and IEF is disconnected.
    • If the information similar to the following is displayed, the network connection is normal:
      * About to connect() to ief2-edgeaccess.cn-north-4.myhuaweicloud.com port 443 (#0)
      *   Trying 49.4.115.239...
      * Connected to ief2-edgeaccess.cn-north-4.myhuaweicloud.com (*.*.*.*) port 443 (#0)
      * Initializing NSS with certpath: sql:/etc/pki/nssdb
      * skipping SSL peer certificate verification
      * NSS: client certificate not found (nickname not specified)
      * SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
      * Server certificate:
      * subject: OID.1.1.1.4=42701fe87611496e80c824778c9857ca,OID.1.1.1.3=op_svc_ief_container1:88125631e95e4d3fbdfa7e6ced0f9dd4,OID.1.1.1.2=cn-north-4:42701fe8761
      1496e80c824778c9857ca:op_cfe_kubelet,OID.1.1.1.1=op_svc_ief_container1,CN=paas.placement.certs.secret OSS3.0 CA,OU=OSS & Service Tools Dept,O="Huawei Technologies 
      Co., Ltd",L=ShenZhen,ST=GuangDong,C=CN
      * start date: Apr 29 16:00:00 2019 GMT
      * expire date: Apr 29 16:00:00 2049 GMT
      * common name: paas.placement.certs.secret OSS3.0 CA
      > GET / HTTP/1.1
      .....

Possible Causes and Solutions

  1. The domain name resolution is abnormal.

    Run the following command to check whether the domain name can be resolved:

    ping ief2-edgeaccess.cn-north-4.myhuaweicloud.com

    If the domain name cannot be resolved into an IP address, run the following command to check whether the DNS server configuration was modified:

    cat /etc/resolv.conf

    Solution:

    • Configure a correct DNS server. The DNS server with IP address 114.114.114.114 is recommended.
    • Obtain the correct IP address resolved from the domain name, and configure the IP address in the host file to temporarily work around this problem.
  2. A proxy problem occurs.

    If the proxy mode is used, check whether the proxy is correctly configured.

    • Check whether a proxy is configured for the edge node.

      Run the following commands:

      env | grep proxy

      env | grep PROXY

    • Check whether a proxy is configured for edgecore.

      Run the following command:

      cat /opt/IEF/Cert/user_config | grep PROXY

    If the proxy mode is not used, run the preceding commands to check that the proxies are configured.

  3. The network connection is not stable.

    Check whether the network connection of the edge node is normal and stable. If the network connection is unstable, the edge node status switches between Faulty and Running.

GPU Driver Is Abnormal

Solution

  1. Install a GPU driver.

    Currently, IEF supports only NVIDIA Tesla P4, P40, and T4 GPUs and the GPU drivers that match CUDA Toolkit 8.0 to 11.0.

    1. Download the GPU driver. The recommended driver link is as follows:

      https://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/tesla/440.33.01/NVIDIA-Linux-x86_64-440.33.01.run&lang=us&type=Tesla

    2. Run the following command to install the GPU driver:

      bash NVIDIA-Linux-x86_64-440.33.01.run

    3. Run the following command to check the GPU driver installation status:

      nvidia-smi

  2. Copy GPU driver files to specific directories.

    1. Log in to the edge node as user root.
    2. Run the following command:

      nvidia-modprobe -c0 -u

    3. Create directories.

      mkdir -p /var/IEF/nvidia/drivers /var/IEF/nvidia/bin /var/IEF/nvidia/lib64

    4. Copy GPU driver files to the directories.
      • For CentOS, run the following commands in sequence to copy the driver files:

        cp /lib/modules/{Kernel version of the current environment}/kernel/drivers/video/nvi* /var/IEF/nvidia/drivers/

        cp /usr/bin/nvidia-* /var/IEF/nvidia/bin/

        cp -rd /usr/lib64/libcuda* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/libEG* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/libGL* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/libnv* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/libOpen* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/libvdpau_nvidia* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib64/vdpau /var/IEF/nvidia/lib64/

      • For Ubuntu, run the following commands in sequence to copy the driver files:

        cp /lib/modules/{Kernel version of the current environment}/kernel/drivers/video/nvi* /var/IEF/nvidia/drivers/

        cp /usr/bin/nvidia-* /var/IEF/nvidia/bin/

        cp -rd /usr/lib/x86_64-linux-gnu/libcuda* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/libEG* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/libGL* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/libnv* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/libOpen* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/libvdpau_nvidia* /var/IEF/nvidia/lib64/

        cp -rd /usr/lib/x86_64-linux-gnu/vdpau /var/IEF/nvidia/lib64/

      You can run the uname -r command to view the kernel version of the current environment, for example, 3.10.0-514.e17.x86_64. Replace the kernel version with the actual value.

      # uname -r
      3.10.0-514.e17.x86_64
    5. Run the following command to change the directory permissions:

      chmod -R 755 /var/IEF

NPU Plug-in Is Abnormal

  1. Log in to the edge node.
  2. Run the following command to check whether the NPU driver container runs properly:

    docker ps -a |grep npu

  3. If the container is not in the Running status, restart the container.

    docker restart {container_name}

    {container_name} indicates the container name.

edgecore Is Abnormal

Check whether the edgecore status is normal.

systemctl status edgecore

If the edgecore component is faulty, the possible causes are as follows:

  • Port 8883 or 1883 is occupied.

    Check whether port 8883 or 1883 of your edge node is occupied. If port 8883 or 1883 is occupied, release the port and run the systemctl restart edgecore command to restore edgecore.

  • The container engine is abnormal.

    Run systemctl status docker to check whether the container engine is normal. If the container engine is abnormal, run systemctl restart docker to restart it.

  • A firewall issue. For details, see Port 8883 Is Disabled by the Firewall.

System Enters the Recovery Mode

If an edge node is forcibly powered off and then powered on, there is a possibility that the system enters the recovery mode. Check whether the /opt/IEF directory is normal. If any file in this directory is lost, the edge node will be faulty.

The /opt/IEF directory is abnormal if any of the following errors occurs:

  • The systemctl status edgecore command output indicates that the edgecore status is abnormal, and the systemctl restart edgecore command output indicates that the edgecore service does not exist.
  • The systemctl status edgelogger command output indicates that the edgelogger status is abnormal, and the systemctl restart edgelogger command output indicates that the edgelogger service does not exist.
  • The systemctl status edgemonitor command output indicates that the edgemonitor status is abnormal, and the systemctl restart edgemonitor command output indicates that the edgemonitor service does not exist.

Solution

Start your edge node in normal mode. If an edge node is powered off abnormally, files on the edge node may be damaged or lost. Therefore, do not perform this operation. If this fault occurs, submit a service ticket.

Kami menggunakan cookie untuk meningkatkan kualitas situs kami dan pengalaman Anda. Dengan melanjutkan penelusuran di situs kami berarti Anda menerima kebijakan cookie kami. Cari tahu selengkapnya

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback