Help Center/ Elastic Cloud Server/ Troubleshooting/ General Issues/ How Do I Troubleshoot a Ping Failure or Packet Loss Using a Link Test?
Updated on 2023-03-30 GMT+08:00

How Do I Troubleshoot a Ping Failure or Packet Loss Using a Link Test?

Symptom

When you accessed other resources from an ECS, network freezing occurred. The ping command output showed that packet loss occurred or the network delay was long.

This section uses Tracert and MTR as an example to describe how to troubleshoot packet loss or long delay.

Possible Causes

Packet loss or long delay may be caused by link congestion, link node faults, high server load, or incorrect system settings.

After verifying that the issue was not caused by the ECS, use Tracert or MTR for further fault locating.

MTR is used to detect network faults.

You can choose to use Tracert or MTR depending on the ECS OS:

Using Tracert in Windows

Tracert shows the path through which packets reach the destination server and the time when the packets reach each node. Tracert offers similar functions as the ping command but it provides more detailed information, including the entire path the packets take, IP address of each pass-through node, and time when the packets arrive at each node.

  1. Log in to the Windows ECS.
  2. Open the cmd window and run the following command to trace the IP address:

    tracert IP address or website

    For example, tracert www.example.com

    The command output shows that:

    • The maximum number of hops is 30 by default. The first column shows the sequence number of each hop.
    • Tracert sends three packets each time. The second, third, and fourth columns show the time the three packets take to arrive their destination. The last column shows the IP addresses of the nodes where the packets were redirected.
    • If the message * * * request timed out is reported, troubleshoot the affected link and node.

Using WinMTR in Windows

  1. Log in to the Windows ECS.
  2. Download the WinMTR installation package from the official website.
  3. Decompress the WinMTR installation package.
  4. Double-click WinMTR.exe to start the tool.
  5. In the WinMTR window, enter the IP address or domain name of the destination server in Host and click Start.

  6. Wait for WinMTR to run for a period of time and click Stop to stop the test.

    The test results are as follows:

    • Hostname: IP address or domain name of each node that the packets pass through to the destination server
    • Nr: number of nodes that the packets pass through
    • Loss%: packet loss rate of a node
    • Sent: number of sent packets
    • Recv: number of received responses
    • Best: shortest response time
    • Avrg: average response time
    • Worst: longest response time
    • Last: last response time

Using MTR in Linux

Installing MTR

MTR has been installed on all Linux distributions. If MTR is not installed on your Linux ECS, run the following command to install it:

  • CentOS
    yum install mtr
  • Ubuntu
    sudo apt-get install mtr

MTR parameters

  • -h/--help: help menu
  • -v/--version: MTR version
  • -r/--report: results of all traces
  • -p/--split: results of each trace
  • -c/--report-cycles: number of packets (10 by default) sent per second
  • -s/--psize: size of a packet
  • -n/--no-dns: no domain name resolution performed for IP addresses
  • -a/--address: IP address for sending packets, which is set if a single host has multiple IP addresses
  • -4: IPv4
  • -6: IPv6

The following uses the link between the local server and the destination server with IP address 119.xx.xx.xx as an example.

Run the following command to obtain the MTR diagnosis results in a report:

mtr 119.xx.xx.xx --report

Information similar to the following is displayed:

[root@ecs-0609 ~]# mtr 119.xx.xx.xx --report
Start: Thu Aug 22 15:41:22 2019
HOST: ecs-652                     Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 100.70.0.1                 0.0%    10    3.0   3.4   2.8   7.5   1.3
  2.|-- 10.242.7.174               0.0%    10   52.4  51.5  34.2  58.9   6.3
  3.|-- 10.242.7.237               0.0%    10    3.2   5.0   2.7  20.8   5.5
  4.|-- 10.230.2.146               0.0%    10    1.0   1.0   1.0   1.1   0.0
  5.|-- 192.168.21.1               0.0%    10    3.5   4.2   2.8  11.6   2.5
  6.|-- 10.242.7.238               0.0%    10   35.3  34.5   6.0  56.4  22.6
  7.|-- 10.242.7.173               0.0%    10    3.3   4.7   3.1  14.7   3.6
  8.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0

The parameters in the preceding command output are described as follows:

  • HOST: IP address or domain name of the node
  • Loss%: packet loss rate
  • Snt: number of packets sent per second
  • Last: last response time
  • Avg: average response time
  • Best: shortest response time
  • Wrst: longest response time
  • StDev: standard deviation, a larger value indicates a larger difference between the response time for each data packet on the node

Handling WinMTR and MTR Reports

The following figure is an example of analyzing the reports of WinMTR and MTR.

  • Local network of the server (area A): the local area network and local ISP network
    • If a node in the local network malfunctions, check the local network.
    • If the local ISP network malfunctions, report the issue to the local carrier.
  • Carrier backbone network (area B): If an error occurs in this area, identify the carrier to which the faulty node belongs based on the node IP address and report the issue to the carrier.
  • Local network on the destination end (area C): the network of the provider to which the destination server belongs
    • If packet loss occurs on the destination server, the network configuration of the destination server may be incorrect. Check the firewall configuration on the destination server.
    • If packet loss occurs on certain nodes with several hops close to the destination server, the network of the provider to which the destination server belongs may be faulty.

Common Link Faults

  • Incorrect destination server configurations
    As shown in the following example, if the packet loss rate is 100%, the packets are not received by the destination server. The fault might be caused by incorrect network configuration on the destination server. In such a case, check the firewall configuration on the destination server.
    Host                                        Loss%   Snt   Last   Avg  Best  Wrst StDev 
    1. ???
    2. ???
    3. 1XX.X.X.X                                0.0%     10  521.3  90.1   2.7 521.3 211.3
    4. 11X.X.X.X                                0.0%     10    2.9   4.7   1.6  10.6   3.9
    5. 2X.X.X.X                                 80.0%    10    3.0   3.0   3.0   3.0   0.0
    6. 2X.XX.XX.XX                              0.0%     10    1.7   7.2   1.6  34.9  13.6
    7. 1XX.1XX.XX.X                             0.0%     10    5.2   5.2   5.1   5.2   0.0
    8. 2XX.XX.XX.XX                             0.0%     10    5.3   5.2   5.1   5.3   0.1
    9. 1XX.1XX.XX.X                             100.0%   10    0.0   0.0   0.0   0.0   0.0
  • ICMP rate limit
    As shown in the following example, packet loss occurs on the fifth hop, but the issue does not persist on subsequent nodes. Therefore, it is determined that the fault is caused by ICMP rate limit on the fifth node. This issue does not affect data transmission to the destination server. Therefore, ignore this issue.
    Host                                        Loss%   Snt   Last   Avg  Best  Wrst StDev 
    1. 1XX.XX.XX.XX                             0.0%    10    0.3   0.6   0.3   1.2   0.3
    2. 1XX.XX.XX.XX                 	    0.0%    10    0.4   1.0   0.4   6.1   1.8
    3. 1XX.XX.XX.XX              	            0.0%    10    0.8   2.7   0.8  19.0   5.7
    4. 1XX.XX.XX.XX                             0.0%    10    6.7   6.8   6.7   6.9   0.1
    5. 1XX.XX.XX.XX                             60.0%   0   27.2  25.3  23.1  26.4  2.9
    6. 1XX.XX.XX.XX                	            0.0%    10   39.1  39.4  39.1  39.7   0.2
    7. 1XX.XX.XX.XX                 	    0.0%    10   39.6  40.4  39.4  46.9   2.3
    8. 1XX.XX.XX.XX          	            0.0%    10   39.6  40.5  39.5  46.7   2.2
  • Loop
    As shown in the following example, the data packets are cyclically transferred after the fifth hop, and they cannot reach the destination server. This fault is caused by incorrect routing configuration on the nodes of the carrier. Contact the carrier to rectify the fault.
    Host                                        Loss%   Snt   Last   Avg  Best  Wrst StDev 
    1. 1XX.XX.XX.XX                  	    0.0%    10    0.3   0.6   0.3   1.2   0.3
    2. 1XX.XX.XX.XX                 	    0.0%    10    0.4   1.0   0.4   6.1   1.8
    3. 1XX.XX.XX.XX                             0.0%    10    0.8   2.7   0.8  19.0   5.7
    4. 1XX.XX.XX.XX       	                    0.0%    10    6.7   6.8   6.7   6.9   0.1
    5. 1XX.XX.XX.65                             0.0%    10    0.0   0.0   0.0   0.0   0.0
    6. 1XX.XX.XX.65                	            0.0%    10    0.0   0.0   0.0   0.0   0.0
    7. 1XX.XX.XX.65                             0.0%    10    0.0   0.0   0.0   0.0   0.0
    8. 1XX.XX.XX.65                             0.0%    10    0.0   0.0   0.0   0.0   0.0
    9. ???                                      0.0%    10    0.0   0.0   0.0   0.0   0.0
  • Link interruption
    As shown in the following example, no response can be received after the data packets are transferred to the fourth hop. This is generally caused by link interruption between the affected nodes. You are advised to perform a further check using a reverse link test. In such a case, contact the carrier to which the affected nodes belong.
    Host                                        Loss%   Snt   Last   Avg  Best  Wrst StDev 
    1. 1XX.XX.XX.XX                   	    0.0%    10    0.3   0.6   0.3   1.2   0.3
    2. 1XX.XX.XX.XX                 	    0.0%    10    0.4   1.0   0.4   6.1   1.8
    3. 1XX.XX.XX.XX                	            0.0%    10    0.8   2.7   0.8  19.0   5.7
    4. 1XX.XX.XX.XX        	                    0.0%    10    6.7   6.8   6.7   6.9   0.1
    5. 1XX.XX.XX.XX                             0.0%    10    0.0   0.0   0.0   0.0   0.0
    6. 1XX.XX.XX.XX                	            0.0%    10    0.0   0.0   0.0   0.0   0.0
    7. 1XX.XX.XX.XX                             0.0%    10    0.0   0.0   0.0   0.0   0.0
    8. 1XX.XX.XX.XX                             0.0%    10    0.0   0.0   0.0   0.0   0.0
    9 1XX.XX.XX.XX                              0.0%    10    0.0   0.0   0.0   0.0   0.0