Help Center/ Distributed Message Service for Kafka/ Best Practices/ Handling Service Overload

Updated on 2025-08-28 GMT+08:00

View PDF

Handling Service Overload

Overview

High CPU usage and full disks indicate overloaded Kafka services.

High CPU usage leads to low system performance and high risk of hardware damage.
If a disk is full, the Kafka log content stored on it goes offline. Then, the disk's partition replicas cannot be read or written, reducing partition availability and fault tolerance. The leader partition switches to another broker, adding load to the broker.

Causes of high CPU usage

There are too many data operation threads: num.io.threads, num.network.threads, and num.replica.fetchers.
Improper partitions. One broker carries all production and consumption services.

Causes of full disk

Current disk space no longer meets the needs of the rapidly increasing service data volume.
Unbalanced broker disk usage. The produced messages are all in one partition, taking up the partition's disk.
The time to live (TTL) set for a topic is too long. Old data takes too much disk space.

Solution

Handling high CPU usage:

Optimize the parameters configuration for threads num.io.threads, num.network.threads, and num.replica.fetchers.
- Set the number of num.io.threads and the number of num.network.threads threads to multiples of the disk quantity. Do not exceed the number of CPU cores
- Set the number of num.replica.fetchers threads to smaller than or equal to 5.
Set topic partitions properly. Set the number of partitions to multiples of the number of brokers.
Attach a random suffix to each message key so that messages can be evenly distributed in partitions.

In actual scenarios, attaching a random suffix to each message key compromises global message sequence. Decide whether a suffix is required by your service.

Handling full disk:

Increase the disk space.
Migrate partitions from the full disk to other disks on the broker.
Set proper topic TTL for less occupation of old data.
If CPU resources are sufficient, compress the data with compression algorithms.
Common compression algorithms include ZIP, gzip, Sappy, and LZ4. You need to consider the data compression rate and duration when selecting compression algorithms. Generally, an algorithm with a higher compression rate consumes more time. For systems with high performance requirements, select algorithms with quick compression, such as LZ4. For systems with high compression rate requirements, select algorithms with high compression rate, such as gzip.

Configure the compression.type parameter on producers to specify a compression algorithm.
```
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
// Enable GZIP.
props.put("compression.type", "gzip");
 
Producer<String, String> producer = new KafkaProducer<>(props);
```

Previous topic: Handling Message Accumulation

Next topic: Handling Uneven Service Data

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.