MERGE INTO
命令功能
通过MERGE INTO命令,根据一张表或子查询的连接条件对另外一张表进行查询,连接条件匹配上的进行UPDATE或DELETE,无法匹配的执行INSERT。这个语法仅需要一次全表扫描就完成了全部同步工作,执行效率要高于INSERT+UPDATE。
命令格式
MERGE INTO tableIdentifier AS target_alias
USING (sub_query | tableIdentifier) AS source_alias
ON <merge_condition>
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ]
<merge_condition> =A equal bool condition
<matched_action> =
DELETE |
UPDATE SET * |
UPDATE SET column1 = expression1 [, column2 = expression2 ...]
<not_matched_action> =
INSERT * |
INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])
参数描述
参数 |
描述 |
---|---|
tableIdentifier |
在其中执行MergeInto操作的Hudi表的名称。 |
target_alias |
目标表的别名。 |
sub_query |
子查询。 |
source_alias |
源表或源表达式的别名。 |
merge_condition |
将源表或表达式和目标表关联起来的条件 |
condition |
过滤条件,可选。 |
matched_action |
当满足条件时进行Delete或Update操作 |
not_matched_action |
当不满足条件时进行Insert操作 |
注意事项
- merge-on condition当前只支持主键列。
- 当前仅支持对COW表进行部分字段更新,且更新值必须包含预合并列,MOR表需要在Update语法中给出全部字段。
示例
- 部分字段更新
create table h0(id int, comb int, name string, price int) using hudi options(primaryKey = 'id', preCombineField = 'comb') LOCATION 'obs://bucket/path/h0'; create table s0(id int, comb int, name string, price int) using hudi options(primaryKey = 'id', preCombineField = 'comb') LOCATION 'obs://bucket/path/s0'; insert into h0 values(1, 1, 1, 1); insert into s0 values(1, 1, 1, 1); insert into s0 values(2, 2, 2, 2); //写法1 merge into h0 using s0 on h0.id = s0.id when matched then update set h0.id = s0.id, h0.comb = s0.comb, price = s0.price * 2; //写法2 merge into h0 using s0 on h0.id = s0.id when matched then update set id = s0.id, name = h0.name, comb = s0.comb + h0.comb, price = s0.price + h0.price;
- 缺省字段更新和插入
create table h0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb') LOCATION 'obs://bucket/path/h0'; create table s0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb') LOCATION 'obs://bucket/path/s0'; insert into h0 values(1, 1, 1, 1, false); insert into s0 values(1, 2, 1, 1, true); insert into s0 values(2, 2, 2, 2, false); merge into h0 as target using ( select id, comb, name, price, flag from s0 ) source on target.id = source.id when matched then update set * when not matched then insert *;
- 多条件更新和删除
create table h0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb') LOCATION 'obs://bucket/path/h0'; create table s0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb') LOCATION 'obs://bucket/path/s0'; insert into h0 values(1, 1, 1, 1, false); insert into h0 values(2, 2, 1, 1, false); insert into s0 values(1, 1, 1, 1, true); insert into s0 values(2, 2, 2, 2, false); insert into s0 values(3, 3, 3, 3, false); merge into h0 using ( select id, comb, name, price, flag from s0 ) source on h0.id = source.id when matched and flag = false then update set id = source.id, comb = h0.comb + source.comb, price = source.price * 2 when matched and flag = true then delete when not matched then insert *;
系统响应
可以检查任务状态是否成功,查看任务日志确认有无异常。