更新时间:2024-11-06 GMT+08:00
Flink SQL逻辑开发规则
维表lookup join场景维度表个数不超过五个
Hudi维度表都在TM heap中,当维表过多时heap中保存的维表数据过多,TM会不断GC,导致作业性能下降。
【示例】lookup join维表数5个:
CREATE TABLE table1(id int, param1 string) with(...);
CREATE TABLE table2(id int, param2 string) with(...);
CREATE TABLE table3(id int, param3 string) with(...);
CREATE TABLE table4(id int, param4 string) with(...);
CREATE TABLE table5(id int, param5 string) with(...);
CREATE TABLE orders (
order_id STRING,
price DECIMAL(32,2),
currency STRING,
order_time TIMESTAMP(3),
WATERMARK FOR order_time AS order_time
) WITH (/* ... */);
select
o.*, t1.param1, t2.param2, t3.param3, t4.param4, t5.param5
from
orders AS o
JOIN table1 FOR SYSTEM_TIME AS OF o.proc_time AS t1 ON o.order_id = t1.id
JOIN table2 FOR SYSTEM_TIME AS OF o.proc_time AS t2 ON o.order_id = t2.id
JOIN table3 FOR SYSTEM_TIME AS OF o.proc_time AS t3 ON o.order_id = t3.id
JOIN table4 FOR SYSTEM_TIME AS OF o.proc_time AS t4 ON o.order_id = t4.id
JOIN table5 FOR SYSTEM_TIME AS OF o.proc_time AS t5 ON o.order_id = t5.id;
多流Join场景流表个数不超过三个
当Join表过多时,状态后端压力太大会导致端到端时延增加。
【示例】实时Join维表数3个:
CREATE TABLE table1(id int, param1 string) with(...);
CREATE TABLE table2(id int, param2 string) with(...);
CREATE TABLE table3(id int, param3 string) with(...);
CREATE TABLE orders (
order_id STRING,
price DECIMAL(32,2),
currency STRING,
order_time TIMESTAMP(3),
WATERMARK FOR order_time AS order_time
) WITH (/* ... */);
select
o.*, t1.param1, t2.param2, t3.param3
from
orders AS o
JOIN table1 AS t1 ON o.order_id = t1.id
JOIN table2 AS t2 ON o.order_id = t2.id
JOIN table3 AS t3 ON o.order_id = t3.id;
关联嵌套层级不超过三层
嵌套层级越多,回撤流的的数据量越大。
【示例】关联嵌套3层:
SELECT *
FROM table1 WHERE column1 IN
(
SELECT column1
FROM table2 WHERE column2 IN (
SELECT column2
FROM table3 WHERE column3 = 'value'
)
)
基于Hudi表的lookup join单表数据量不超过1GB
Hudi维度表都在TM heap中,当维表过大时heap中保存的维表数据过多,TM会不断GC导致作业性能下降。
流流关联中不能加入批Source算子
流流关联中不能加入批Source算子,根据业务情况将该Source算子调整为维表算子。
父主题: Flink SQL逻辑开发规范