Updated on 2024-05-29 GMT+08:00

UPSERT: Writing Data

This section describes the basic SQL syntax and usage of the upsert function when ClickHouse data is written.

This section applies only to MRS 3.3.0 or later.

Basic Syntax

  • INSERT VALUES

    UPSERT INTO [database_name.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ...

  • INSERT SELECT

    UPSERT INTO [database_name.]table [(c1, c2, c3)] SELECT ...

Example

  • Create a table.
    CREATE TABLE default.upsert_tab ON CLUSTER default_cluster
    (    
    `id`     Int32,    
    `pdate`  Date,    
    `name`   String 
    )ENGINE = ReplicatedMergeTree('/clickhouse/tables/default/{shard}/upsert_tab', '{replica}')
    PARTITION BY toYYYYMM(pdate)
    PRIMARY KEY id 
    ORDER BY id 
    SETTINGS index_granularity = 8192;
  • Upsert data.
    Upsert into upsert_tab(id, pdate, name) values (1, rand() % 365, 'abc'), (2, rand() % 365, 'bcd'), (1, rand() % 365, 'def');
  • Query data in the test_upsert table.
    select * from upsert_tab;
    ┌─id─┬───pdate─┬─name─┐
    │  2   │   1970-06-09│  bcd   │
    │  1   │   1970-11-30│  def   │
    └───┴── ────┴────┘
  • Upsert for transactions

    Similar to other SQL syntax, Upsert also supports explicit and implicit transactions. Before using transactions, you need to enable the transaction function.

Precautions

  • When creating MergeTree and ReplicatedMergeTree tables, specify the primary key or order by field as the unique key for deduplication. If no primary key is specified and only the order by attribute is specified during table creation, the order by field is used for deduplication.
  • The key for deduplication must be sharded in advance to ensure that same key fields are in the same shard to ensure deduplication accuracy.