Help Center/ Data Lake Insight/ FAQs/ Flink Jobs/ O&M Guide/ Why Is the Flink Job Abnormal Due to Heartbeat Timeout Between JobManager and TaskManager?
Updated on 2023-05-19 GMT+08:00

Why Is the Flink Job Abnormal Due to Heartbeat Timeout Between JobManager and TaskManager?

Symptom

JobManager and TaskManager heartbeats timed out. As a result, the Flink job is abnormal.

Figure 1 Error information

Possible Causes

  1. Check whether the network is intermittently disconnected and whether the cluster load is high.
  2. If Full GC occurs frequently, check the code to determine whether memory leakage occurs.
    Figure 2 Full GC

Handling Procedure

  • If Full GC occurs frequently, check the code to determine whether memory leakage occurs.
  • Allocate more resources for a single TaskManager.
  • Contact technical support to modify the cluster heartbeat configuration.