Updated on 2024-11-29 GMT+08:00

MERGE INTO

Function

This command is used to query another table based on the join condition of a table or subquery. If UPDATE or DELETE is executed for the table matching the join condition, and INSERT is executed if the join condition is not met. This command completes the synchronization requiring only one full table scan, delivering higher efficiency than INSERT plus UPDATE.

Syntax

MERGE INTO tableIdentifier AS target_alias

USING (sub_query | tableIdentifier) AS source_alias

ON <merge_condition>

[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]

[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]

[ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ]

<merge_condition> =A equal bool condition

<matched_action> =

DELETE |

UPDATE SET * |

UPDATE SET column1 = expression1 [, column2 = expression2 ...]

<not_matched_action> =

INSERT * |

INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])

Parameter Description

Table 1 Parameters

Parameter

Description

tableIdentifier

Name of the Hudi table.

target_alias

Alias of the target table.

sub_query

Subquery.

source_alias

Alias of the source table or source expression.

merge_condition

Condition for associating the source table or expression with the target table.

condition

Filtering condition. This parameter is optional.

matched_action

DELETE or UPDATE operation to be performed when conditions are met.

not_matched_action

INSERT operation to be performed when conditions are not met.

Precautions

  1. The merge-on condition supports only primary key columns currently.
  2. Currently, only some fields in the COW table can be updated, and the updated values must contain the pre-merged columns. All fields in the MOR table must be provided in the Update statement.

Examples

  • Update some fields.
    create table h0(id int, comb int, name string, price int) using hudi options(primaryKey = 'id', preCombineField = 'comb');
    create table s0(id int, comb int, name string, price int) using hudi options(primaryKey = 'id', preCombineField = 'comb');
    insert into h0 values(1, 1, 1, 1);
    insert into s0 values(1, 1, 1, 1);
    insert into s0 values(2, 2, 2, 2);
    // Method 1
    merge into h0 using s0
    on h0.id = s0.id
    when matched then update set h0.id = s0.id, h0.comb = s0.comb, price = s0.price * 2;
    // Method 2
    merge into h0 using s0
    on h0.id = s0.id
    when matched then update set id = s0.id,
    name = h0.name,
    comb = s0.comb + h0.comb,
    price = s0.price + h0.price;
  • Update and insert default fields.
    create table h0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb');
    create table s0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb');
    insert into h0 values(1, 1, 1, 1, false);
    insert into s0 values(1, 2, 1, 1, true);
    insert into s0 values(2, 2, 2, 2, false);
    
    merge into h0 as target
    using (
    select id, comb, name, price, flag from s0
    ) source
    on target.id = source.id
    when matched then update set *
    when not matched then insert *;
  • Update and delete data with multiple conditions.
    create table h0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb');
    create table s0(id int, comb int, name string, price int, flag boolean) using hudi options(primaryKey = 'id', preCombineField = 'comb');
    insert into h0 values(1, 1, 1, 1, false);
    insert into h0 values(2, 2, 1, 1, false);
    insert into s0 values(1, 1, 1, 1, true);
    insert into s0 values(2, 2, 2, 2, false);
    insert into s0 values(3, 3, 3, 3, false);
    
    merge into h0
    using (
    select id, comb, name, price, flag from s0
    ) source
    on h0.id = source.id
    when matched and flag = false then update set id = source.id, comb = h0.comb + source.comb, price = source.price * 2
    when matched and flag = true then delete
    when not matched then insert *;

System Response

You can view the result in driver logs or on the client.