Updated on 2023-09-15 GMT+08:00

Handling Service Overload

Introduction

High CPU usage and full disks indicate overloaded Kafka services.

  • High CPU usage leads to low system performance and high risk of hardware damage.
  • If a disk is full, the Kafka log content stored on it goes offline. Then, the disk's partition replicas cannot be read or written, reducing partition availability and fault tolerance. The leader partition switches to another broker, adding load to the broker.

Causes of high CPU usage

  • There are too many data operation threads: num.io.threads, num.network.threads, and num.replica.fetchers.
  • Improper partitions. One broker carries all production and consumption services.

Causes of full disk

  • Current disk space no longer meets the needs of the rapidly increasing service data volume.
  • Unbalanced broker disk usage. The produced messages are all in one partition, taking up the partition's disk.
  • The time to live (TTL) set for a topic is too long. Old data takes too much disk space.

Solution

Handling high CPU usage:

  • Optimize the parameters configuration for threads num.io.threads, num.network.threads, and num.replica.fetchers.
    • Set the number of num.io.threads and the number of num.network.threads threads to multiples of the disk quantity. Do not exceed the number of CPU cores
    • Set the number of num.replica.fetchers threads to smaller than or equal to 5.
  • Set topic partitions properly. Set the number of partitions to multiples of the number of brokers.
  • Attach a random suffix to each message key so that messages can be evenly distributed in partitions.

    In actual scenarios, attaching a random suffix to each message key compromises global message sequence. Decide whether a suffix is required by your service.

Handling full disk:

  • Increase the disk space.
  • Migrate partitions from the full disk to other disks on the broker.
  • Set a proper TTL for topics to decrease the of old data.
  • If CPU resources are sufficient, compress the data with compression algorithms.

    Common compression algorithms include ZIP, gzip, Sappy, and LZ4. You need to consider the data compression rate and duration when selecting compression algorithms. Generally, an algorithm with a higher compression rate consumes more time. For systems with high performance requirements, select algorithms with quick compression, such as LZ4. For systems with high compression rate requirements, select algorithms with high compression rate, such as gzip.

    Configurethe compression.type parameter on producers to specify a compression algorithm.

    Properties props = new Properties();
    props.put("bootstrap.servers", "localhost:9092");
    props.put("acks", "all");
    props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    // Enable GZIP.
    props.put("compression.type", "gzip");
     
    Producer<String, String> producer = new KafkaProducer<>(props);