Joining Big and Small Tables
Scenarios
There are big tables and small tables when you join two Flink streams. Small table data is broadcasted to every join task, and large table data is rebalanced (distribute the data in a round robin fashion) to join tasks. This way, Flink SQL usability and job stability are improved.
How to Use
You can use Flink SQL Hints to specify the left or right table in a join as a broadcasted table and the other table as a rebalanced table. The following SQL statement examples use table A and table C as small tables:
- Use table A as the broadcasted table.
- Join
SELECT /*+ BROADCAST(A) */ a2, b2 FROM A JOIN B ON a1 = b1
- Where
SELECT /*+ BROADCAST(A) */ a2, b2 FROM A, B WHERE a1 = b1
- Join
- Use table A and table C as broadcasted tables.
SELECT /*+ BROADCAST(A, C) */ a2, b2, c2 FROM A JOIN B ON a1 = b1 JOIN C ON a1 = c1
- This feature can be used with /*+ BROADCAST(smallTable1, smallTable2) */ to be compatible with the open-source joins of two streams.
- Switching between open-source joins and this feature is not supported because this feature broadcasts data to each join task.
- Using a small table as the left table of a LEFT JOIN or a small table as the right table of a RIGHT JOIN is not supported.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot