Help Center/ Cloud Container Engine/ FAQs/ Node/ OSs/ What Should I Do If the Number of ARP Entries Exceeds the Upper Limit?
Updated on 2024-09-04 GMT+08:00

What Should I Do If the Number of ARP Entries Exceeds the Upper Limit?

Symptom

The ARP cache exceeds the upper limit, resulting in the abnormal inter-container access, for example, the coredns DNS resolution failure.

Possible Causes

The number of ARP entries cached in the containers on the node exceeds the upper limit.

Fault Locating

  • If the OS kernel of a node is later than 4.3, neighbor table overflow will display in the dmsg log. For details, see GitHub.
    # dmesg -T
    [Tue May 30 18:35:55 2023] neighbour: arp_cache: neighbor table overflow!
    [Tue May 30 18:35:55 2023] neighbour: arp_cache: neighbor table overflow!
    [Tue May 30 18:35:55 2023] neighbour: arp_cache: neighbor table overflow!
    [Tue May 30 18:35:55 2023] neighbour: arp_cache: neighbor table overflow!
    [Tue May 30 18:35:55 2023] neighbour: arp_cache: neighbor table overflow!
    [Tue May 30 18:35:55 2023] neighbour: arp_cache: neighbor table overflow!
    [Tue May 30 18:35:55 2023] neighbour: arp_cache: neighbor table overflow!
    [Tue May 30 18:35:55 2023] neighbour: arp_cache: neighbor table overflow!
    [Tue May 30 18:35:55 2023] neighbour: arp_cache: neighbor table overflow!
    [Tue May 30 18:35:58 2023] print_fib4_table_status: 7 callbacks suppressed
    [Tue May 30 18:35:59 2023] print_fib4_table_status: 23 callbacks suppressed
    [Tue May 30 18:36:00 2023] print_fib4_table_status: 16 callbacks suppressed
    [Tue May 30 18:36:03 2023] print_fib4_table_status: 7 callbacks suppressed
    [Tue May 30 18:36:04 2023] print_fib4_table_status: 17 callbacks suppressed
    [Tue May 30 18:37:38 2023] net_ratelimit: 7966 callbacks suppressed
    [Tue May 30 18:37:38 2023] neighbour: arp_cache: neighbor table overflow!
  • If the kernel version of the node OS is earlier than 4.3, neighbor table overflow will not display. If callbacks suppressed is displayed, the number of ARP entries may exceed the upper limit.

Solution

The maximum number of non-permanent entries allowed by a node is determined by the net.ipv4.neigh.default.gc_thresh3 parameter of the kernel. This parameter is not isolated by namespace. The node and containers running on the node share the ARP table size. In containers, set this parameter to 163790.

How to calculate the kernel parameter

  • In CCE Turbo clusters and clusters using the container tunnel networks

    net.ipv4.neigh.default.gc_thresh3 = Number of containers on a single node x Number of available IP addresses on the container subnet (If there are multiple container subnets in a CCE Turbo cluster, use the maximum number of available IP addresses on a container subnets and the maximum number of containers that can be deployed on a single node.)

    For example, if a container subnet is 192.168.0.1/20, there will be 4,096 IP addresses available and there can be at most 35 containers deployed on a single node, so you can set net.ipv4.neigh.default.gc_thresh3 to 143360 (4096 x 35).

  • Clusters using the VPC networks

    net.ipv4.neigh.default.gc_thresh3 = Number of containers on a single node squared

    For example, if the subnet mask of a node is 25, there will be 128 container IP addresses available, so you can set net.ipv4.neigh.default.gc_thresh3 to 16384 (128 x 128).

The preceding formulas are used in extreme scenarios.

1. All containers on a node proactively access all IP addresses in the container CIDR block. For example, a gateway container needs to access all other containers in the same cluster.

2. All available IP addresses in a container CIDR block are used up.

  1. In 88-k8s.conf, change the value of net.ipv4.neigh.default.gc_thresh3 to 163790.

    vi /etc/sysctl.d/88-k8s.conf

    The net.ipv4.neigh.default.gc_thresh1 and net.ipv4.neigh.default.gc_thresh2 parameters cannot be modified.

  2. Run the following command to reload the configuration file:

    sysctl -p /etc/sysctl.d/88-k8s.conf

  3. Check whether the configuration takes effect.

    sysctl -a | grep gc_thresh3