Updated on 2025-05-22 GMT+08:00

Optimizing Kafka Performance

Kafka performance optimization

Client configuration optimization

Producer configuration suggestions

For details, see Setting Parameters for Kafka Clients.

Consumer configuration suggestions

Parameter

Recommended Value

Description

max.poll.records

500

The maximum number of messages a consumer can consume at a time. The default value is 500. If consumption per message takes longer, use a smaller value and ensure that messages in a batch can be processed as max.poll.interval.ms specified.

max.poll.interval.ms

300000

The maximum interval between two consumption pulls. The default interval is 300s. If the interval exceeds 300s, the consumer is considered abnormal.

fetch.min.bytes

Adjusted based on services

The minimum size of data each fetch request returns. The default value is 1. The bigger the value, the higher the throughput and latency.

Performance metrics

Kafka provides the following performance metrics to help analyze message accumulation, partition data skew, and traffic imbalance.

Metric ID

Metric Name

Description

broker_disk_usage

Disk Capacity Usage

Disk usage of the Kafka VM

broker_cpu_core_load

Average Load per CPU Core

Average load of each CPU core of the Kafka VM

broker_memory_usage

Memory Usage

Memory usage of the Kafka VM

broker_cpu_usage

CPU Usage

CPU usage of the Kafka VM

group_msgs

Accumulated Messages

Total number of accumulated messages in all consumer groups of a Kafka instance

topic_messages_remained

Available Messages

Number of remaining messages that can be retrieved from the specified topic in the consumer group

broker_messages_in_rate

Message Creation Rate

Kafka node production speed per second

broker_connections

Connections

Number of connections on the Kafka node

Data partition optimization

Kafka divides each topic into multiple partitions for distributed message storage. Each partition has one or more replicas distributed on different brokers. Each replica stores a copy of full data. Messages are synchronized among replicas. The following figure shows the relationships between topics, partitions, replicas, and brokers.

Uneven service data among brokers and partitions may happen, leading to low performance of Kafka clusters and low resource utilization.

Causes of uneven service data:

  • The traffic of some topics is much heavier than that of others.
  • Producers specified partitions when sending messages, leaving unspecified partitions empty.
  • Producers specified message keys to send messages to specific partitions.
  • The system re-implements flawed partition allocation policies.
  • There are new Kafka brokers with no partitions allocated.
  • Cluster changes lead to switches and migration of leader replicas, causing data on some brokers to increase.

Data compression

If CPU resources are sufficient, compress the data with compression algorithms.

Common compression algorithms include ZIP, gzip, Sappy, and LZ4. You need to consider the data compression rate and duration when selecting compression algorithms. Generally, an algorithm with a higher compression rate consumes more time.

Compression Mode

Compression Ratio

Client CPU Usage

Server CPU Usage

Disk Usage

Broker Bandwidth Usage

gzip

Medium

Medium

Low

Medium

Low

LZ4

Medium

Medium

Medium

Medium

Medium

zstd

High

Medium

Low

Low

Low

Snappy

Low

High

High

High

High

For high TPS, use LZ4; for lower network I/O or CPU usage, use zstd. gzip is not recommended because it is a computing-sensitive compression algorithm. Batch message compression can be better used to obtain higher TPS.