Updated on 2022-12-14 GMT+08:00

Memory Configuration Optimization

Scenarios

The computing of Flink depends on memory. If the memory is insufficient, the performance of Flink will be greatly deteriorated. One solution is to monitor garbage collection (GC) to evaluate the memory usage. If the memory becomes the performance bottleneck, optimize the memory usage according to the actual situation.

If Full GC is frequently reported in the Container GC on the Yarn that monitors the node processes, the GC needs to be optimized.

In the env.java.opts configuration item of the conf/flink-conf.yaml file on the client, add the -Xloggc:<LOG_DIR>/gc.log -XX:+PrintGCDetails -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=20M parameter. The GC log is configured by default.

Procedure

  • Optimize GC.

    Adjust the ratio of tenured generation memory to young generation memory. In the conf/flink-conf.yaml configuration file on the client, add the -XX:NewRatio parameter to the env.java.opts configuration item. For example, -XX:NewRatio=2 indicates that ratio of tenured generation memory to young generation memory is 2:1, that is, the young generation memory occupies one third and tenured generation memory occupies two thirds.

  • When developing Flink applications, optimize the partitioning or grouping operation of DataStream.
    • If partitioning causes data skew, partitions need to be optimized.
    • Do not perform concurrent operations, because some operations, WindowAll for example, to DataStream do not support parallelism.
    • Do not use set keyBy to string type.