Updated on 2024-05-29 GMT+08:00

Joining Big and Small Tables

This section applies to MRS 3.3.0 or later.

Scenarios

There are big tables and small tables when you join two Flink streams. Small table data is broadcasted to every join task, and large table data is rebalanced (distribute the data in a round robin fashion) to join tasks. This way, Flink SQL usability and job stability are improved.

How to Use

You can use Flink SQL Hints to specify the left or right table in a join as a broadcasted table and the other table as a rebalanced table. The following SQL statement examples use table A and table C as small tables:

  • Use table A as the broadcasted table.
    • Join
      SELECT /*+ BROADCAST(A) */ a2, b2 FROM A JOIN B ON a1 = b1
    • Where
      SELECT /*+ BROADCAST(A) */ a2, b2 FROM A, B WHERE a1 = b1
  • Use table A and table C as broadcasted tables.
    SELECT /*+ BROADCAST(A, C) */ a2, b2, c2 FROM A JOIN B ON a1 = b1 JOIN C ON a1 = c1
  • This feature can be used with /*+ BROADCAST(smallTable1, smallTable2) */ to be compatible with the open-source joins of two streams.
  • Switching between open-source joins and this feature is not supported because this feature broadcasts data to each join task.
  • Using a small table as the left table of a LEFT JOIN or a small table as the right table of a RIGHT JOIN is not supported.