Updated on 2024-09-13 GMT+08:00

Basic Metrics: Flink Metrics

This section describes the categories, names, and meanings of Flink metrics reported to AOM.

Table 1 Flink metrics

Category

Metric

Description

Unit

CPU

flink_jobmanager_Status_JVM_CPU_Load

CPU load of the JVM in JobManager

N/A

flink_jobmanager_Status_JVM_CPU_Time

CPU time of the JVM in JobManager

N/A

flink_jobmanager_Status_ProcessTree_CPU_Usage

CPU usage of the JVM in JobManager

N/A

flink_taskmanager_Status_JVM_CPU_Load

CPU load of the JVM in TaskManager

N/A

flink_taskmanager_Status_JVM_CPU_Time

CPU time of the JVM in TaskManager

N/A

flink_taskmanager_Status_ProcessTree_CPU_Usage

CPU usage of the JVM in TaskManager

N/A

Memory

flink_jobmanager_Status_JVM_Memory_Heap_Used

Used heap memory of JobManager

Bytes

flink_jobmanager_Status_JVM_Memory_Heap_Committed

Available JVM heap memory of JobManager

Bytes

flink_jobmanager_Status_JVM_Memory_Heap_Max

Maximum heap memory that can be used for memory management in JobManager

Bytes

flink_jobmanager_Status_JVM_Memory_NonHeap_Used

Used off-heap memory of JobManager

Bytes

flink_jobmanager_Status_JVM_Memory_NonHeap_Committed

Available JVM off-heap memory of JobManager

Bytes

flink_jobmanager_Status_JVM_Memory_NonHeap_Max

Maximum off-heap memory that can be used for memory management in JobManager

Bytes

flink_jobmanager_Status_JVM_Memory_Metaspace_Used

Used memory of the JobManager metaspace memory pool

Bytes

flink_jobmanager_Status_JVM_Memory_Metaspace_Committed

Available JVM memory of the JobManager metaspace memory pool

Bytes

flink_jobmanager_Status_JVM_Memory_Metaspace_Max

Maximum memory that can be used in the JobManager metaspace memory pool

Bytes

flink_jobmanager_Status_JVM_Memory_Direct_Count

Number of buffers in the direct buffer pool of JobManager

N/A

flink_jobmanager_Status_JVM_Memory_Direct_MemoryUsed

Memory for the direct buffer pool in JobManager

Bytes

flink_jobmanager_Status_JVM_Memory_Direct_TotalCapacity

Total capacity of all buffers in the direct buffer pool of JobManager

Bytes

flink_jobmanager_Status_JVM_Memory_Mapped_Count

Number of buffers in the mapped buffer pool of JobManager

N/A

flink_jobmanager_Status_JVM_Memory_Mapped_MemoryUsed

Memory for the mapped buffer pool in JobManager

Bytes

flink_jobmanager_Status_JVM_Memory_Mapped_TotalCapacity

Total capacity of all buffers in the mapped buffer pool of JobManager

Bytes

flink_jobmanager_Status_Flink_Memory_Managed_Used

Managed memory that has been used in JobManager

Bytes

flink_jobmanager_Status_Flink_Memory_Managed_Total

Total managed memory of JobManager

Bytes

flink_taskmanager_Status_JVM_Memory_Heap_Used

Used heap memory of TaskManager

Bytes

flink_taskmanager_Status_JVM_Memory_Heap_Committed

Available JVM heap memory of TaskManager

Bytes

flink_taskmanager_Status_JVM_Memory_Heap_Max

Maximum heap memory that can be used for memory management in TaskManager

Bytes

flink_taskmanager_Status_JVM_Memory_NonHeap_Used

Used off-heap memory of TaskManager

Bytes

flink_taskmanager_Status_JVM_Memory_NonHeap_Committed

Available JVM off-heap memory of TaskManager

Bytes

flink_taskmanager_Status_JVM_Memory_NonHeap_Max

Maximum off-heap memory that can be used for memory management in TaskManager

Bytes

flink_taskmanager_Status_JVM_Memory_Metaspace_Used

Used memory of the TaskManager metaspace memory pool

Bytes

flink_taskmanager_Status_JVM_Memory_Metaspace_Committed

Available JVM memory of the TaskManager metaspace memory pool

Bytes

flink_taskmanager_Status_JVM_Memory_Metaspace_Max

Maximum memory that can be used in the TaskManager metaspace memory pool

Bytes

flink_taskmanager_Status_JVM_Memory_Direct_Count

Number of buffers in the direct buffer pool of TaskManager

N/A

flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed

Memory for the direct buffer pool in TaskManager

Bytes

flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity

Total capacity of all buffers in the direct buffer pool of TaskManager

Bytes

flink_taskmanager_Status_JVM_Memory_Mapped_Count

Number of buffers in the mapped buffer pool of TaskManager

N/A

flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed

Memory for the mapped buffer pool in TaskManager

Bytes

flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity

Total capacity of all buffers in the mapped buffer pool of TaskManager

Bytes

flink_taskmanager_Status_Flink_Memory_Managed_Used

Managed memory that has been used in TaskManager

Bytes

flink_taskmanager_Status_Flink_Memory_Managed_Total

Total managed memory of TaskManager

Bytes

flink_taskmanager_Status_ProcessTree_Memory_RSS

Memory of the whole process in the Linux system

Bytes

Threads

flink_jobmanager_Status_JVM_Threads_Count

Total number of active threads in JobManager

Count

flink_taskmanager_Status_JVM_Threads_Count

Total number of active threads in TaskManager

Count

Garbage collection

flink_jobmanager_Status_JVM_GarbageCollector_ConcurrentMarkSweep_Count

Number of garbage collection (GC) times of the JobManager Concurrent Mark Sweep (CMS) collector

Count

flink_jobmanager_Status_JVM_GarbageCollector_ConcurrentMarkSweep_Time

Total time required for the JobManager CMS collector to collect garbage

ms

flink_jobmanager_Status_JVM_GarbageCollector_ParNew_Count

Number of JobManager GC times

Count

flink_jobmanager_Status_JVM_GarbageCollector_ParNew_Time

Each GC duration of JobManager

ms

flink_taskmanager_Status_JVM_GarbageCollector_ConcurrentMarkSweep_Count

Number of GC times of the TaskManager CMS collector

Count

flink_taskmanager_Status_JVM_GarbageCollector_ConcurrentMarkSweep_Time

Total time required for the TaskManager CMS collector to collect garbage

ms

flink_taskmanager_Status_JVM_GarbageCollector_ParNew_Count

Number of TaskManager GC times

Count

flink_taskmanager_Status_JVM_GarbageCollector_ParNew_Time

Each GC duration of TaskManager

ms

Class loader

flink_jobmanager_Status_JVM_ClassLoader_ClassesLoaded

Total number of classes that JobManager has loaded since the JVM started

N/A

flink_jobmanager_Status_JVM_ClassLoader_ClassesUnloaded

Total number of classes that JobManager has unloaded since the JVM started

N/A

flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded

Total number of classes that TaskManager has loaded since the JVM started

N/A

flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded

Total number of classes that TaskManager has unloaded since the JVM started

N/A

Network

flink_taskmanager_Status_Network_AvailableMemorySegments

Number of unused memory segments of TaskManager

N/A

flink_taskmanager_Status_Network_TotalMemorySegments

Total number of allocated memory segments of TaskManager

N/A

Default shuffle service

flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments

Number of unused memory segments of TaskManager

N/A

flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments

Number of used memory segments of TaskManager

N/A

flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments

Number of allocated memory segments of TaskManager

N/A

flink_taskmanager_Status_Shuffle_Netty_AvailableMemory

Unused memory of TaskManager

Bytes

flink_taskmanager_Status_Shuffle_Netty_UsedMemory

Used memory of TaskManager

Bytes

flink_taskmanager_Status_Shuffle_Netty_TotalMemory

Allocated memory of TaskManager

Bytes

Availability

flink_jobmanager_job_numRestarts

Total number of restarts since job submission

Count

Checkpoint

flink_jobmanager_job_lastCheckpointDuration

Time taken to complete the latest checkpoint

ms

flink_jobmanager_job_lastCheckpointSize

Size of the latest checkpoint. If incremental checkpoints are enabled or logs are changed, this metric may be different from lastCheckpointFullSize.

Bytes

flink_jobmanager_job_numberOfInProgressCheckpoints

Number of checkpoints that are in progress

Count

flink_jobmanager_job_numberOfCompletedCheckpoints

Number of checkpoints that are completed

Count

flink_jobmanager_job_numberOfFailedCheckpoints

Number of failed checkpoints

Count

flink_jobmanager_job_totalNumberOfCheckpoints

Total number of checkpoints

Count

I/O

flink_taskmanager_job_task_numBytesOut

Total number of bytes output by a task

Bytes

flink_taskmanager_job_task_numBytesOutPerSecond

Total number of bytes output by a task per second

Bytes/s

flink_taskmanager_job_task_isBackPressured

Whether a backpressure event occurs

N/A

flink_taskmanager_job_task_numRecordsIn

Total number of records received by a task

Count

flink_taskmanager_job_task_numRecordsInPerSecond

Total number of records received by a task per second

Records/s

flink_taskmanager_job_task_numBytesIn

Number of bytes received by a task

Bytes

flink_taskmanager_job_task_numBytesInPerSecond

Number of bytes received by a task per second

Bytes/s

flink_taskmanager_job_task_numRecordsOut

Total number of records sent by a task

Count

flink_taskmanager_job_task_numRecordsOutPerSecond

Total number of records sent by a task per second

Records/s

flink_taskmanager_job_task_operator_numRecordsIn

Total number of records received by an operator

Count

flink_taskmanager_job_task_operator_numRecordsInPerSecond

Total number of records received by an operator per second

Records/s

flink_taskmanager_job_task_operator_numRecordsOut

Total number of records sent by an operator

Count

flink_taskmanager_job_task_operator_numRecordsOutPerSecond

Total number of records sent by an operator per second

Records/s

flink_taskmanager_job_task_operator_sourceIdleTime

Idle duration at the source end

ms

flink_taskmanager_job_task_operator_source_numRecordsIn

Total number of records input to the source

Count

flink_taskmanager_job_task_operator_sink_numRecordsOut

Total number of records output from the sink

Count

flink_taskmanager_job_task_operator_source_numRecordsInPerSecond

Number of records input to the source per second

Records/s

flink_taskmanager_job_task_operator_sink_numRecordsOutPerSecond

Number of records output from the sink per second

Records/s

Kafka connector

flink_taskmanager_job_task_operator_currentEmitEventTimeLag

Interval between the data event time and the time when the data leaves the source

ms

flink_taskmanager_job_task_operator_currentFetchEventTimeLag

Interval between the data event time and the time when the data enters the source

ms

flink_taskmanager_job_task_operator_pendingRecords

Number of data records that have not been pulled by the source

Count