What If Checkpoint Is Executed Slowly in RocksDBStateBackend Mode When the Data Amount Is Large

Question

What to do if checkpoint is executed slowly in RocksDBStateBackend mode when the data amount is large?

Cause Analysis

Customized windows are used and the window state is ListState. There are many values under the same key. In the case of a new value, the merge operation of RocksDB is used. When calculation is triggered, all values under the key are read.

The RocksDB mode is merge() > merge() ... > merge() > read(). Data reading in this mode consumes much time, as shown in Figure 1.
The source operator sends a large amount of data in a short period of time and the data keys are the same. The window operator fails to process data fast enough and barriers accumulate in the cache. The time consumed for snapshot preparation is too long and the window operator cannot report snapshot completion to CheckpointCoordinator in the specified time. Therefore, CheckpointCoordinator determines snapshot preparation failure, as shown in Figure 2.

Figure 1 Time monitoring

Figure 2 Relationship

Answer

Flink introduces the third-party software package RocksDB, whose defect causes the problem. You are advised to set checkpoint to FsStateBackend mode.

Set checkpoint to FsStateBackend mode in the application code as follows:

env.setStateBackend(new FsStateBackend("hdfs://hacluster/flink/checkpoint/"));

Parent topic: FAQ

Previous topic: What If the Page Is Displayed Abnormally on Internet Explorer 10/11

Next topic: What If yarn-session Start Fails When blob.storage.directory Is Set to /home

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.