Help Center/
MapReduce Service/
Component Operation Guide (LTS) (Paris Region)/
Using Flink/
Flink Performance Tuning/
Optimization DataStream/
Summarization
Updated on 2022-12-14 GMT+08:00
Summarization
Avoiding Data Skew
If data skew occurs (certain data volume is large), the execution time of tasks is inconsistent even if no garbage collection is performed.
- Redefine keys. Use keys of smaller granularity to optimize the task size.
- Modify the DOP.
- Call the rebalance operation to balance data partitions.
Setting Timeout Interval for the Buffer
- During the execution of tasks, data is switched through network switching. You can configure the setBufferTimeout parameter to specify the timeout interval for the buffer.
- If setBufferTimeout is set to -1, the refreshing operation is performed when the buffer full, maximizing the throughput. If setBufferTimeout is set to 0, the refreshing operation is performed each time data is received, minimizing the delay. If setBufferTimeout is set to a value greater than 0, the refreshing operation is performed after the butter times out.
The following is an example:
env.setBufferTimeout(timeoutMillis); env.generateSequence(1,10).map(new MyMapper()).setBufferTimeout(timeoutMillis);
Parent topic: Optimization DataStream
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot