What Should I Do If the Data Disk Usage Is High Because a Large Volume of Data Is Written Into the Log File?
Symptom
Service containers on nodes that use containerd as the container runtime continuously write a large volume of data into the log file, resulting in full space of the /var/lib/containerd directory and slowing down the creation and deletion of containers on the node. This may evict pods and cause problems like high disk usage and abnormal nodes.
Possible Causes
For such service containers, if their logs are generated to the STDOUT, the kubelet will dump the logs. The kubelet also maintains the lifecycle of all containers on the node.
The kubelet will be overloaded if there are too many service containers on a node and they write a large volume of data into the log files. If the load exceeds a certain threshold, kubelet will dump the logs to the disk, which further results in a high disk usage. Operations such as container creation and deletion on the node will be affected.
Solution
Typically, for a node with 8 vCPUs, 16 GiB memory, and a 100 GiB data disk, the standard log output rate of a single container should be less than or equal to 512 KB/s and the overall standard log output rate of all containers on the node should be less than or equal to 5 MB/s. If a large number of logs are generated, resolve this issue in either of the following ways:
- Do not schedule containers which generate too many logs on the same node. For example, configure anti-affinity policies for pods running such containers or reduce the maximum number of pods on a single node.
- Attach an additional data disk separately. For example, you can attach an extra data disk or mount a dynamically provisioned storage volume when creating a node so that logs can be written to files in it.
Node Running FAQs
- What Should I Do If a Cluster Is Available But Some Nodes Are Unavailable?
- How Do I Troubleshoot the Failure to Remotely Log In to a Node in a CCE Cluster?
- How Do I Log In to a Node Using a Password and Reset the Password?
- How Do I Collect Logs of Nodes in a CCE Cluster?
- What Can I Do If the Container Network Becomes Unavailable After yum update Is Used to Upgrade the OS?
- What Should I Do If the vdb Disk of a Node Is Damaged and the Node Cannot Be Recovered After Reset?
- Which Ports Are Used to Install kubelet on CCE Cluster Nodes?
- How Do I Configure a Pod to Use the Acceleration Capability of a GPU Node?
- What Should I Do If I/O Suspension Occasionally Occurs When SCSI EVS Disks Are Used?
- What Should I Do If Excessive Docker Audit Logs Affect the Disk I/O?
- How Do I Fix an Abnormal Container or Node Due to No Thin Pool Disk Space?
- Which Ports Does a Node Listen On?
- How Do I Rectify Failures When the NVIDIA Driver Is Used to Start Containers on GPU Nodes?
- What Should I Do If a Node Does Not Synchronize with the NTP Clock Source?
- What Should I Do If the Data Disk Usage Is High Because a Large Volume of Data Is Written Into the Log File?
- Why Does My Node Memory Usage Obtained by Running the kubelet top node Command Exceeds 100%?
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbotmore