Help Center/ Elastic Cloud Server/ User Guide/ Self-Service O&M/ Performing In-Depth Diagnosis
Updated on 2025-08-12 GMT+08:00

Performing In-Depth Diagnosis

Scenario

ECS supports in-depth diagnosis for OSs to help you quickly identify and resolve common problems.

This section describes the Linux distributions that support in-depth diagnosis as well as the in-depth diagnosis conclusion.

Constraints

  • Cloud Operations Center (COC) needs to be enabled and authorized.

    For IAM users, permissions for COC operations need to be granted. For details, see Configuring Custom Policies for ECS Self-Service O&M.

  • UniAgent must be installed. UniAgent is a unified data collection agent and supports script delivery and execution.

    If UniAgent is not installed on the ECS, commands cannot be submitted without login. For details, see Installing UniAgent on an ECS.

  • Only ECSs running Linux support in-depth diagnosis.
  • The following table lists the Linux distributions and versions that support in-depth diagnosis.

    Distribution

    Version

    CPU Architecture

    Huawei Cloud EulerOS

    Huawei Cloud EulerOS 2.0

    Huawei Cloud EulerOS 1.1

    x86/Kunpeng

    CentOS

    CentOS 8.2 64bit

    CentOS 8.0 64bit

    CentOS 7.9 64bit

    CentOS 7.8 64bit

    CentOS 7.7 64bit

    CentOS 7.6 64bit

    CentOS 7.5 64bit

    CentOS 7.4 64bit

    CentOS 7.3 64bit

    CentOS 7.2 64bit

    x86

    Ubuntu

    Ubuntu 22.04 server 64bit

    Ubuntu 20.04 server 64bit

    x86

    Debian

    Debian 11.1.0 64bit

    Debian 10.0.0 64bit

    x86

Procedure

  1. Log in to the management console and go to the Elastic Cloud Server page.
  2. In the ECS list, locate the target ECS and choose More > View O&M and Monitoring > Perform In-Depth Diagnosis in the Operation column.
  3. (Optional) On the Enable COC and Grant Permissions page, read and agree to the service statement, and click Enable and Authorize.

    This page is displayed if COC is not enabled and authorized.

  4. On the slide-out Perform In-depth Diagnosis panel, select Comprehensive diagnosis for In-depth Diagnosis Scenario.

    UniAgent is required for performing in-depth diagnosis. If a message is displayed indicating that UniAgent is not installed or failed to be installed, install it first by referring to Installing UniAgent on an ECS.

    Figure 1 In-depth diagnosis
  5. Select the checkbox and click OK.

    For details about the diagnosis results and details, see In-depth Diagnosis Conclusion.

  6. On the Diagnosis Report tab, view the diagnosis details.
    Figure 2 Diagnosis report
  7. In the Diagnosis Details area, click on the left of the abnormal items to view the exception details and rectify them based on the optimization suggestions.
    Figure 3 Diagnosis abnormal items (example)

In-depth Diagnosis Conclusion

Diagnosis Item ID

Diagnosis Item Name

Conclusion

guestos.cpu.high_total_usage

Checking High CPU Usage

The overall CPU usage exceeds 80% of the entire system.

guestos.cpu.high_process_usage

Checking Processes with High CPU Usage

The CPU usage of a single process exceeds 50% of the entire system.

guestos.cpu.high_core_usage

Checking High CPU Usage of a Single-Core CPU

The CPU usage of a single-core CPU exceeds 85%.

guestos.storage.high_inode_usage

Checking Disk Usage

The file system usage or inode usage of some EVS disks in the instance exceeds 80%. As a result, new files cannot be created in corresponding partitions.

guestos.filesystem.invalid_device

Checking Devices in the fstab File

A device configured in the fstab file under directory /etc/ on the current instance does not exist. As a result, the instance may fail to be started.

guestos.filesystem.device_mount_failure

Checking the Device Mounting Status in the fstab File

The instance contains EVS disks that are not automatically mounted in fstab under /etc/. As a result, the instance may fail to be started.

guestos.filesystem.invalid_format

Checking the fstab File Format

The configuration format of the fstab file is incorrect. As a result, the instance may fail to be started.

guestos.network.firewall_status_check

Checking the System Firewall Status

The firewall (iptables setting) of the current instance is enabled. If the firewall is enabled on the server and rules for shielding external access are configured, remote access to the instance may fail.

guestos.memory.oom_events

Checking OS OOM

The out-of-memory (OOM) issue occurred in the guest OS of the current instance.

guestos.ssh.incorrect_file_permission

Checking SSH Public or Private Key Access Permission

The permission of the public key or private key file on which SSH of the current instance depends is incorrect. As a result, the instance cannot be accessed through SSH.

guestos.ssh.missing_critical_file

Checking SSH Key Files

If a key file or directory for the SSH service of the current instance is missing, the instance cannot be accessed through SSH.

guestos.memory.high_total_usage

Checking High Memory Usage

The memory usage of an instance exceeds 80%.

guestos.ssh.forbidden_root_login

Checking SSH Login Using User root

The SSH service of the current instance does not allow the root user to log in. As a result, the instance cannot be accessed using SSH as the root user.

guestos.system.port_listenning

Checking the Listening Status of Common Service Ports

Port 22 is not listened on. Log in to the instance, check whether the port service is normal, and rectify the fault as required.

guestos.system.unreasonable_file_limits

Checking Limits Settings

Some configuration values in the system file /etc/security/limits.conf of the current instance are greater than the preset values. As a result, the instance may fail to be remotely logged in.

guestos.memory.unreasonable_hugepage_config

Checking Memory Huge Page Configurations

The hugepage memory specified by the kernel parameter vm.nr_hugepages of the current instance is too large. As a result, the instance may fail to be remotely logged in.

guestos.network.wrong_nat_config

Checking the Kernel Parameters of the NAT Gateway Environment

The kernel parameters related to NAT gateway access are incorrectly configured for the current instance. As a result, users cannot connect to the instance through SSH, and HTTP-based access to the instance is abnormal. Check and change the values of net.ipv4.tcp_tw_recycle and net.ipv4.tcp_timestamps in /etc/sysctl.conf.

guestos.network.wrong_tcp_sack

Checking tcp_sack Configuration

tcp_sack is not enabled for the current instance, which may affect the network performance of the Linux instance.

guestos.system.wrong_selinux_status

Checking SELinux Status

The SELinux service is enabled for the instance. As a result, an error is reported when you remotely connect to the instance using SSH. Disable the SELinux service temporarily or permanently based on service requirements.

guestos.system.missing_critical_user

Checking Settings of Key System Users

The system account of the current instance does not exist. As a result, you may fail to log in to the instance.

guestos.network.disabled_multi_queue

Checking Whether NIC Multi-Queue Is Enabled

If the NIC multi-queue feature is disabled, the network performance may deteriorate. Enable this function as required.

guestos.system.critical_file_exists

Checking the Existence of Key System Files

Some key system files in the system directory of the instance are missing. As a result, the instance may fail to be logged in to.

guestos.system.critical_service_exists

Checking the Startup Status of Key System Processes

Key system processes (such as the SSH process) of the instance are not running. As a result, the instance may fail to be accessed.

guestos.system.critical_file_format_invalid

Checking the Format of Key System Files

The format of the file corresponding to the system account of the current instance is incorrect (not in UNIX format). As a result, the instance may fail to be logged in.

guestos.network.nic_dropped

Checking NIC Packet Loss

There is packet loss on the NIC of the current instance. As a result, the latency of some service requests is high or some service requests fail.

guestos.cpu.res_interrupts

Checking High IPI Rescheduling Interruption Rate

There are a large number of IPI rescheduling interrupts on the current instance, which may incur extra overhead and service performance deterioration. Checking the number of IPI rescheduling interrupts based on service requirements is recommended.

guestos.cpu.tlb_interrupts

Checking Excessive TLB Interrupts

There are a large number of TLB interrupts on the current instance, which may incur extra overhead and service performance deterioration. Checking the number of TLB rescheduling interrupts based on service requirements is recommended.

guestos.cpu.syscall_high_usage

Checking High CPU Usage for the System Kernel Space

The sys kernel space of the current instance occupies a large number of CPU resources. This is usually due to an excessive number of calls from an application, which may affect the CPU usage of the application.

guestos.cpu.irq_not_balanced

Imbalanced Inter-CPU Interrupt Usage

The interrupt usage between CPU cores of the current instance is unbalanced. The interrupt usage is concentrated on one or more cores. As a result, the SI of a single core is high, and services may be affected by the single-core CPU bottleneck. Check it.

guestos.storage.high_latency

Checking High Storage Latency

If the storage latency of the current instance disk is too high, service freezing and high response latency may occur. Check the disk specifications based on the service storage I/O requirements.

guestos.network.too_much_close_wait_connections

Excessive Connections in CLOSE_WAIT State

There are too many connections in the CLOSE_WAIT state on the current instance, which may incur new requests failures. Check it.

guestos.network.nf_conntrack_table_full

Checking Conntrack Table Overflow

Conntrack tables of the current instance overflow. As a result, new connections may be discarded, causing service request failures.

guestos.network.too_much_new_connections

Checking Excessive New Connections

There are a large number of new connections on the current instance, which may increase the latency of new requests or cause new request failures

guestos.network.possible_ddos_attack

Checking Suspected DDoS Attack

The current instance may have DDoS attack risks. Check it.

guestos.storage.io_bottleneck

Checking That Storage I/O Reaches The Upper Limit Of Disk QoS

The disk storage IOPS or bandwidth of the current instance exceeds the upper limit of the disk QoS, which may cause service freezing or failure.

guestos.network.socket_listenning_queue_overflow

Checking Socket Listening Queue Overflow

The socket listening queue of the current instance overflows and packet loss occurs. As a result, new connections may fail to be established.

guestos.network.udp_buffer_overflow

Checking Packet Loss Caused by UDP Buffer Overflow

Packet loss occurs due to buffer overflow caused by UDP burst traffic in the current instance.

guestos.network.too_much_time_wait_connections

Checking Excessive Connections in the TIME_WAIT State

Too many connections in the TIME_WAIT state may cause new requests failure and services unavailable.

guestos.network.too_much_fin_wait2_connections

Checking Excessive Connections in FIN_WAIT2 State

Too many connections in the FIN_WAIT2 state may occupy a large number of local ports.

guestos.network.too_much_established_connections

Checking Excessive Connections in ESTABLISHED State

Too many connections in the ESTABLISHED state of the current instance occupy a large number of local ports and memory. Check whether the connections are normal based on services.

guestos.system.file_descriptor_not_enough

Checking Too Few File Handles

The file handles configured for the current instance is too few. When the number of file handles used by services reaches the upper limit, the system becomes unavailable.

guestos.network.local_port_range_too_small

Checking Too Small Local Port Range

The local port range configured for the current instance is too small. When a large number of concurrent requests are sent to other services, the error message "Cannot assign requested address" may be displayed. As a result, new connections fail to be created.

guestos.network.qdisc_queue_overflow

Checking Packet Loss Due To QDisc Sending Queue Overflow

QDisc queue packet loss data

guestos.memory.swap_check

Checking Service swap

If swap occurs in the current instance service, the service performance deteriorates, and services with high performance requirements are greatly affected.

guestos.memory.transparent_hugepage_check

Checking Transparent Huge Page Configuration

Transparent huge page is enabled for the current instance. Determine whether to enable transparent huge page based on service performance.

guestos.memory.buffer_cache_too_high

Checking High Memory Buffer/Cache Usage

The memory buffer/cache usage of the current instance is high. As a result, the free memory may be insufficient. When malloc is performed at the application layer, cache reclamation is frequently triggered, causing service performance deterioration.

guestos.memory.process_used_too_high

Checking High Memory Usage of Service Processes

The memory usage of the current instance service process is high. The available memory may be insufficient. As a result, the service performance deteriorates.

guestos.network.traffic_exceed

Checking Network Traffic Over-Upper-Limit

The network traffic of the current instance exceeds the upper limit of the current network QoS, which may affect service performance.

guestos.network.socket_tcp_buffer_overflow

Checking TCP Buffer Overflow

The number of pages used by the socket of the current instance is close to the upper limit of the TCP buffer. As a result, packet loss may occur due to TCP buffer overflow.

guestos.network.socket_udp_buffer_overflow

Checking UDP Buffer Overflow

The number of pages used by the socket of the current instance is close to the upper limit of the UDP buffer. As a result, packet loss may occur due to UDP buffer overflow.

guestos.gpu.gpu_status

GPU Status

After the "nvidia-smi" command is executed, the GPU status returned becomes abnormal.

guestos.gpu.gpu_card_lost

GPU Card Drop

GPUs are disconnected.

guestos.gpu.core_temp_too_high

GPU Core Overtemperature

GPU core temperature is too high.

guestos.gpu.mem_temp_too_high

GPU Memory Overtemperature

GPU memory temperature is too high.

guestos.gpu.fan_error

Abnormal GPU Fan

The GPU fan is abnormal. There is an error.

guestos.gpu.power_error

Abnormal GPU Power Consumption

The GPU power consumption is abnormal. There is an error.

guestos.gpu.memory_usage_too_high

High GPU Memory Usage

Excessive use of GPU memory may cause program crashes.

guestos.gpu.gpu_usage_too_high

High GPU Computing Power Usage

GPU computing power usage is too high, which may result in insufficient computing power.

guestos.gpu.pcie_link_error

GPU Bandwidth Exception

The GPU bandwidth is abnormal. There may be a hardware error.

guestos.gpu.pstate_low

Poor GPU Performance

GPU performance falls short, failing to unlock its full potential for optimal use.

guestos.gpu.ecc_mode

Disabled ECC Mode

ECC mode is disabled, and ECC errors cannot be identified.

guestos.gpu.volatile_ecc_error

Too Many Volatile ECC Errors for GPU

The number of volatile ECC errors for the target GPU exceeds threshold 4.

guestos.gpu.aggregate_ecc_error

Too Many Aggregate ECC Errors for GPU

The number of aggregate ECC errors for the target GPU exceeds threshold 4.

guestos.gpu.retired_pages_count_error

DRAM ECC Page Retirement Error

The number of retired pages for DRAM ECC exceeds the threshold 60.

guestos.gpu.retired_pages_pending_error

Pending Page Retirement for DRAM ECC

Pending page retirement for GPU DRAM ECC.

guestos.gpu.xid_error

GPU XID Error

The GPU has an XID error.

guestos.gpu.kernel_version_error

Inconsistent GPU Kernel Versions

The version of the installed kernel of the GPU driver is inconsistent with the current kernel version.

guestos.gpu.nouveau_error

nouveau Driver Not-disabled

Nouveau driver not disabled error.

guestos.gpu.cuda_tips

Non-installation of CUDA

Cuda not installed prompt.

guestos.gpu.fabricmanager_error

Non-installation of fabricmanager

fabricmanager not installed error.

guestos.gpu.sram_ecc_too_many_error

Too many SRAM ECC Errors

There are SRAM ECC errors.

guestos.gpu.remapped_dram_ecc_error

Row Remapping Fails Due to Excessive DRAM ECC

DRAM ECC errors lead to remapping failures.

guestos.gpu.dram_ecc_pending_error

Pending Row Remapping Error for DRAM ECC

There is a pending row remapping error for DRAM ECC.

guestos.gpu.volatile_dram_correctable_too_many_error

Too Many Correctable ECC Errors for DRAM

The number of correctable ECC errors for DRAM is greater than 1,000.

guestos.gpu.volatile_dram_uncorrectable_too_many_error

Too Many Uncorrectable ECC Errors for DRAM

The number of uncorrectable ECC errors for DRAM is greater than 60.

guestos.system.missing_initramfs

initramfs File Check

The instance does not have the initramfs file required for system startup. As a result, the instance could not start and the system cannot be accessed.

guestos.system.missing_initramfs_module

initramfs File Key Driver Configuration Check

The instance does not have the virtio_scsi configuration, which is required for starting the QingTian instance. With some flavors, this can result in startup errors.

guestos.system.missing_grubcfg

GRUB Configuration File Check

The GRUB configuration file required for system startup is missing.

guestos.system.missing_vmlinuz

vmlinuz File Check

The vmlinuz configuration file, which is needed for system startup, could not be found.

guestos.system.conflict_ntp_service

Time Synchronization Service Check

The instance runs both the chronyd and ntpd services. The two services conflict in some scenarios, and the clock stability of the instance node cannot be ensured.

guestos.system.ntp_service_status_abnormal

Status Check for the Time Synchronization Service

The time synchronization service of the instance is not running properly. It is not in the running state.

guestos.system.ntp_service_enablestatus_abnormal

Automatic Startup Configuration Check for the Time Synchronization Service

The automatic startup status of the time synchronization service of the instance is abnormal and is not enabled.

guestos.filesystem.duplicate_fs

Check for Duplicate File Systems in the fstab File

The fstab file of the instance contains duplicate file system mounting configurations.

guestos.filesystem.fstab_uuid_status

fstab File UUID Check

UUID is not used in the fstab file of the instance.

guestos.filesystem.fstab_duplicate_mount

Check for Duplicate Mount Points in the fstab File

There are duplicate mount points in the fstab file of the instance.

guestos.filesystem.mount_path_mismatch

fstab Consistency Check

Mounting mapping between file systems and directories in the fstab file is inconsistent with the actual mounting.

guestos.filesystem.fstype_mismatch

Check for the Consistency Between the fstype of the fstab File and the fstype of the Real-world File System

The fstype in the fstab file is inconsistent with the actual file system fstype.

guestos.filesystem.blkid_duplicate_uuid

The Same UUID for Multiple File Systems of an Instance

There are multiple file systems that have the same UUID in the instance.

guestos.filesystem.ext4_not_clean

Abnormal ext4 File System

There is an abnormal ext4 file system in the instance.

guestos.network.static_ip_not_work

Invalid Static IP Address of Instance NIC

The static IP address configuration of the instance NIC does not take effect.

guestos.network.dhclient_not_work

Abnormal Resident Process of the DHCP Client

The resident process of the DHCP client is abnormal.

guestos.network.network_service_abnormal

Instance Network Service Exception

The network service of the instance is abnormal.

guestos.system.serial_port_log_not_configured

Serial Port Log Output Check

Check the serial port log output.

guestos.system.page_allocation_failure

Memory Page Allocation Failure

Failed to allocate the memory page for printing instance logs.

guestos.system.fork_failed

Log Printing Process Creation Failure

Threads cannot be created for an instance.

guestos.system.too_many_open_files

Failed to Open a New File During Instance Log Printing

The instance cannot open new file handles.