Help Center/
MapReduce Service/
User Guide (Kuala Lumpur Region)/
MRS Cluster Component Operation Gudie/
Using Flink/
Flink Performance Tuning/
Optimization DataStream/
Experience Summary
Updated on 2022-08-12 GMT+08:00
Experience Summary
Avoiding Data Skew
If data skew occurs (certain data volume is extremely large), the execution time of tasks is inconsistent even though no GC is performed.
- Redefine keys. Use keys of smaller granularity to optimize the task size.
- Modify the DOP.
- Call the rebalance operation to balance data partitions.
Setting Timeout Interval for the Buffer
- During the execution of tasks, data is exchanged through network. You can set the setBufferTimeout parameter to specify a buffer timeout interval for data exchanging among different servers.
- If setBufferTimeout is set to -1, the refreshing operation is performed when the buffer is full to maximize the throughput. If setBufferTimeout is set to 0, the refreshing operation is performed each time data is received to minimize the delay. If setBufferTimeout is set to a value greater than 0, the refreshing operation is performed after the buffer times out.
The following is an example:
env.setBufferTimeout(timeoutMillis); env.generateSequence(1,10).map(new MyMapper()).setBufferTimeout(timeoutMillis);
Parent topic: Optimization DataStream
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot