Before modifying a kernel parameter, it is important to have a full understanding of its meaning and function. Exercise caution when making changes, because incorrect settings can lead to unexpected system errors and instability.
You need to make sure that:
- You have fully understood the functions and impacts of kernel parameters. This helps you set kernel parameters correctly.
- The parameter values you entered must be valid and meet the expectation, or the modification will not take effect.
You can optimize kernel parameters of Nginx ingresses and configure kernel parameters using initContainers by doing as follows:
By default, CCE enables kernel parameter tuning for the NGINX Ingress Controller add-on of 2.2.75, 2.6.26, 3.0.1, and later versions.
- Increase the size of the connection queue.
In a high-concurrency scenario, the connection queue may overflow if it is too small, resulting in the failure to establish some connections. The size of the connection queue for the process listener socket is determined by the net.core.somaxconn kernel parameter. By modifying this parameter, you can increase the size of the Nginx ingress connection queue.
When a process uses the listen system to listen on ports, it passes in the backlog parameter, which sets the size of the socket connection queue. The value of backlog cannot exceed that of somaxconn. The Go program standard library uses the somaxconn value as the default queue size when listening. However, Nginx does not read somaxconn when listening on the socket. Instead, it reads nginx.conf. In the configuration items for listening ports in nginx.conf, you can set the backlog parameter to specify the connection queue size for Nginx listening port. The following shows an example configuration:
server {
listen 80 backlog=1024;
...
If the value backlog is not specified, the default value 511 is used. By default, the maximum size of the connection queue for the Nginx listening port is 511, even if the value of somaxconn is greater than 511. This can lead to connection queue overflow in high-concurrency scenarios.
The NGINX Ingress Controller can automatically read and use the value of somaxconn as the backlog value, which is then written to the generated nginx.conf file. This means that the connection queue size for an Nginx ingress is determined solely by somaxconn, and the default size in CCE is 4096. In a high-concurrency scenario, it is recommended that you run the following command to set somaxconn to 65535:
sysctl -w net.core.somaxconn=65535
- Expand the range of source ports.
In a high-concurrency scenario, an Nginx ingress establishes connections with an upstream server using a large number of source ports. The range of source ports is randomly selected from the range defined in the net.ipv4.ip_local_port_range kernel parameter. In such scenarios, a small port range can quickly exhaust source ports, leading to abnormal connections. The default source port range for pods created in CCE is 32768 to 60999. To expand the range to 1024 to 65535, run the following command:
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
- Adjust TIME_WAIT.
You can enable TIME_WAIT reuse for Nginx ingresses, allowing TIME_WAIT connections to be reused for new TCP connections. Additionally, reducing the value of net.ipv4.tcp_fin_timeout in FIN_WAIT2 state and net.netfilter.nf_conntrack_tcp_timeout_time_wait in the TIME_WAIT state can help release resources occupied by them more quickly. To enable TIME_WAIT reuse, run the following commands:
sysctl -w net.ipv4.tcp_fin_timeout=15
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
Add initContainers to NGINX Ingress Controller pods and configure the preceding kernel parameters. The following shows an example:
...
initContainers:
- name: setsysctl
image: ***(By default, CCE uses the nginx-ingress image of the community.)
securityContext:
runAsUser: 0
runAsGroup: 0
capabilities:
add:
- SYS_ADMIN
drop:
- ALL
command:
- sh
- -c
- |
if [ "$POD_IP" != "$HOST_IP" ]; then
mount -o remount rw /proc/sys
if [ $? -eq 0 ]; then
sysctl -w net.core.somaxconn=65535
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
sysctl -w net.ipv4.tcp_fin_timeout=15
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
else
echo "Failed to remount /proc/sys as read-write. Skipping sysctl commands."
fi
fi
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP