Updated on 2024-06-19 GMT+08:00

Performing Environment Checks

OSs and Kernel Versions

For details, see Installing the Media.

  1. Check for the basic software package.

    You need to manually modify the configuration file of the UOS. After the deployment starts, modify the /cpaas/conf/check_list.json file, find the "type": "os" line, and add "enable": false,.

  2. XFS fragmentation
  3. If you encounter a kmem issue, refer to the following:

    https://access.redhat.com/solutions/532663

    https://github.com/opencontainers/runc/issues/1725

    https://github.com/kubernetes/kubernetes/issues/61937

    https://github.com/kubernetes/kubernetes/issues/61937#issuecomment-567042968

    https://kubeovn.github.io/docs/v1.12.x/en/start/prepare/

GRUB Startup Parameters

  1. Solve the kmem issue.
    1. Edit /etc/default/grub (in CentOS, Red Hat, or tlinux) or /boot/efi/EFI/kylin/grub.cfg (in Kylin OS).

      In the line containing GRUB_CMDLINE_LINUX=, add cgroup.memory=nokmem after crashkernel and run grub2-mkconfig -o /boot/grub2/grub.cfg.

    2. Restart the system.

      If the added parameter can be found in /proc/cmdline, the modification was successful.

    https://github.com/opencontainers/runc/issues/1725

    https://github.com/kubernetes/kubernetes/issues/61937

    https://github.com/kubernetes/kubernetes/issues/61937#issuecomment-567042968

  2. Disable huge pages.
    1. Edit /etc/default/grub (in CentOS, Red Hat, or tlinux) or /boot/efi/EFI/kylin/grub.cfg (Kylin OS) and add transparent_hugepage=never to GRUB_CMDLINE_LINUX.
    2. Run grub2-mkconfig -o /boot/grub2/grub.cfg.
    3. Restart the server and check the results by against the below image.

    In an Arm architecture, if the Redis service is not disabled, the performance will be severely affected.

Kernel Modules

Network kernel module requirements:

  • If Red Hat is used and the version is earlier than 4.18.0, or Red Hat is not used and the version is earlier than 4.19.0, check nf_conntrack_ipv4. If IPv6 is enabled, check nf_conntrack_ipv6.
  • If kube-ovn is used, check geneve and openvswitch.
  • Check ip_vs, ip_vs_rr, ip_vs_wrr, and ip_vs_sh.

Take CentOS 7 as an example. Run the following command as user root:

cat <<EOF > /etc/modules-load.d/cpaas.conf
iptable_nat
EOF

Restart the server and run lsmod | grep iptable_nat. If the iptables_nat module is present, the to-do task is successfully configured.

User Permissions

root

You can log in to the system over SSH as a non-root user and run su - to gain root access.

sshd Configuration

  • Each node in the global cluster can be remotely logged in through SSH.
  • The values of UseDNS and UsePAM in /etc/ssh/sshd_config must be no.

If the user is not root, you need to configure the /etc/sudoers file so that the user can run sudo without entering the password.

If reverse resolution is not set up for the DNS, it may time out.

Swap

Disabled

Failing to meet this requirement may cause a sharp increase in system I/O, leading to Docker becoming unresponsive.

Firewall

Disabled

This is a requirement from the official Kubernetes documentation.

SELinux

Disabled

This is a requirement from the official Kubernetes documentation.

Time Synchronization

The time of all servers must be synchronized, and the time difference cannot exceed 10 seconds.

This is a requirement from the official Docker and Kubernetes documentation.

Time Zone

The time zones of all servers must be the same.

The time zone should be Asia/Shanghai.

/etc/sysctl.conf Kernel Parameters

  • vm.max_map_count=262144
  • net.ipv4.ip_forward=1
  • vm.drop_caches=3
  • net.ipv4.tcp_tw_recycle=0
  • net.ipv4.tcp_mtu_probing=1
  • ipv4.conf.all.rp_filter=0
  • ipv4.conf.eth0.rp_filter=0
  • net.ipv4.conf.default.rp_filter=0
  • ipv6.disable=0

vm.max_map_count must be specified for the server where Elasticsearch runs.

The configuration of net.ipv4.ip_forward is required in the official Kubernetes document.

You need to disable the file cache.

https://serverfault.com/questions/646604/what-causes-syn-to-listen-sockets-dropped

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4396e46187ca5070219b81773c4e65088dac50cc

https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

rp_filter configurations are required for communications between components in two Calico subnets in different modes.

Hostname Format

Obtain the hostname of the node. If the hostname is used as the node name, it must have a unique value and cannot exceed 36 characters. The following requirements must be met:

  • Only lowercase letters, digits, hyphens (-), and periods (.) are allowed.
  • The value cannot contain consecutive periods and hyphens (.-), consecutive periods (..), or consecutive hyphens and periods (-.).
  • The value must start with a letter or digit.
  • The value must end with a letter or digit.

For details, see https://kubernetes.io/docs/concepts/overview/working-with-objects/names/.

/etc/hosts

Hostnames of all servers can be resolved to IP addresses. localhost can be resolved to 127.0.0.1.

The hosts file cannot contain duplicate hostnames.

core Files

Run ulimit -c 0 to disable the generation of core files and add 'ulimit -S -c 0' to the /etc/profile file.

Sometimes, when a process restarts within a pod, core files can be generated. These core files take up a significant amount of disk space. This can cause the pod to exit unexpectedly and even impact the entire node.

Requirements for /etc/resolv.conf

If a search domain is present, parsing svc may result in an error. To resolve this issue, simply remove the search field.

There has to be an /etc/resolv.conf file and it has to contain the nameserver configuration item. IP addresses starting with 127 are not allowed.

DefaultTasks

Run systemctl show --property=DefaultTasksMax. If the returned value is not infinity or a large number such as 18446744073709551615, you need to change the value of DefaultTasksMax.

Change DefaultTasksMax to DefaultTasksMax=infinity in the /etc/systemd/system.conf file.

Impact: When using the global platform as a service cluster in all-in-one or standard deployment architecture, the number of customer services can be affected. This can result in abnormal pods when a large number of services are started.

AppArmor

  1. Disable AppArmor:

    systemctl stop apparmor.service && systemctl disable apparmor.service;

  2. Open /etc/default/grub and add apparmor=0 to GRUB_CMDLINE_LINUX.

When the UOS is used, if the runtime version is containerd 1.6.4 or later, AppArmor must be disabled to avoid deployment errors.

Software Package and System Tools

  • The following system tools must exist on the host:

    ip, ss, tar, swapoff, modprobe, sysctl, md5sum, and either SCP or SFTP

  • bc must be installed for EulerOS.
  • To deploy topolvm and rook, lvm2 must be installed.

Software Package Removal

Kylin OS comes with runC pre-installed, which conflicts with the runC deployed on the platform, so, runC must be removed prior to deployment.

/tmp Access

The account must have the permissions to run ls and cat in the /tmp directory.

GPU Devices

Check whether the device exists when a GPU device is used.

CPU Cores

The number of CPU cores must be at least 2.

Memory Size

The memory size is at least 2 GiB.

Kubelet Service Checks

The /etc/systemd/system/kubelet.service file cannot exist.

Default Route

The server has a default route or a route pointing to 0.0.0.0.

Whether Ports Are Occupied

Check whether the following listening ports are occupied:

  • All ports to be checked: 10249, 10250, and 10256
  • Master node ports: 2379, 2380, 6443, 10249, 10250, 10251, 10252, and 10256
  • Kube-OVN ports: 6641 and 6642
  • Calico port: 179

Network Interfaces

The network interfaces configured for the cluster and node are present.

Hardware Architecture

The hardware architecture (x86 or Arm) of the node must be the same as that of the cluster.

IP Address

The IP address must be valid and exist. If IPv6 is enabled, the IPv6 address must also be valid and exist.

The node IP address cannot be a loopback IP address.

  • 127.0.0.1
  • 0:0:0:0:0:0:0:0 or :: (The node IP address cannot be a multicast address.)
  • 224.0.0.0 to 239.255.255.255
  • An IPv6 address starting with FF. The node IP address cannot be a link-local address.
  • 169.254.0.0/16* address block
  • fe80::/10* address block: The node IP address cannot be all-0 IP address or a broadcast address.
  • 255.255.255.255

IP Address Segment

The IP addresses in the 172.16.x.x to 172.32.x.x CIDR block required by Docker are not occupied.

If the IP addresses within the CIDR block are already in use and cannot be changed, modify the Docker configuration files on all nodes by adding the bip parameter to prevent Docker from using those occupied IP addresses.

Node Access

The node and its SSH port are accessible.

Whether the Node Can Access the Platform

The node can access the platform through the platform address.

Whether the Node Can Access the Platform Image Repository

The node can access the image repository of the platform.

Whether the Node IP Address Is in Any Configured CIDR Blocks

The node IP address is not within any of the configured CIDR blocks, including the default subnet CIDR block, container CIDR block, service CIDR block, or join CIDR block.

CIDR Block Checks

  • When the default subnet is underlay, the system skips the verification process of whether the node IP address is within the default subnet CIDR block, service CIDR block, or join CIDR block.
  • When the default subnet is a non-underlay network, the node IP address (including IPv6) must not be within any of the configured CIDR blocks, which include the default subnet CIDR block, container CIDR block, service CIDR block, or join CIDR block.

Master Port Access

Check the connectivity between the host and each port of all master nodes (6443, 2379, and 2380).

Checks for the pki Directory

Check whether the /var/lib/kubelet/pki directory is either empty or non-existent on the target host.

Checks for the cri Directory Space

Check the available size of the specified directory (/var/lib/containerd or /var/lib).

Check Timeout

By default, it takes about 110s to add a node on the UI.

Checks for Directories That Cannot Exist

The /var/log/pods directory cannot exist.

/usr/bin Service

If the docker, containerd, or runc exists, it must be in /usr/bin.