Help Center/ MapReduce Service/ Troubleshooting/ Using Kafka/ Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"
Updated on 2024-12-18 GMT+08:00

Producer Occasionally Fails to Send Data and the Log Displays "Too many open files in system"

Symptom

When Producer sends data to Kafka, it is found that the client fails to send data.

Figure 1 Producer fails to send data.

Possible Causes

  1. The Kafka service is abnormal.
  2. The network is abnormal.
  3. The Kafka topic is abnormal.

Cause Analysis

  1. Check the Kafka service status:
    • MRS Manager: Log in to MRS Manager and choose Services > Kafka. Check the Kafka status. The status is Good, and the monitoring metrics are correctly displayed.
    • FusionInsight Manager: Log in to FusionInsight Manager and choose Cluster > Services > Kafka. Check the Kafka status. The status is Good, and the monitoring metrics are correctly displayed.
  2. View the error topic information in the SparkStreaming log.

    Run the Kafka commands to obtain the topic assignment information and copy synchronization information, and check the return result.

    kafka-topics.sh --describe --zookeeper <zk_host:port/chroot>

    As shown in Figure 2, the topic status is normal. All partitions have normal leader information.

    Figure 2 Topic status
  3. Run the telnet command to check whether the Kafka can be connected.

    telnet Kafka service IP address Kafka service port

    If Telnet fails, check the network security group and ACL.

  4. Log in to Kafka Broker using SSH.

    Run the cd /var/log/Bigdata/kafka/broker command to go to the log directory.

    Check on server.log indicates that the error message is displayed in the log shown in the following figure.

    Figure 3 Log exception
  5. Output of the lsof command used to check the handle usage of the Kafka process on the current node shows that the number of handles used by the Kafka process reaches 470,000.
    Figure 4 Handles
  6. Check the service code. It is found that new Producer objects are frequently created and are not closed normally.

Solution

  1. Stop the current application to ensure that the number of handles on the server does not increase sharply, which affects the normal running of services.
  2. Optimize the application code to resolve the handle leakage problem.

    Suggestion: Use one Producer object globally. After the use is complete, call the Close interface to close the handle.