UPSERT: Writing Data

This section describes the basic SQL syntax and usage of the upsert function when ClickHouse data is written.

Basic Syntax

INSERT VALUES
UPSERT INTO [database_name.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ...
INSERT SELECT
UPSERT INTO [database_name.]table [(c1, c2, c3)] SELECT ...

Example

Create a table.

CREATE TABLE default.upsert_tab ON CLUSTER default_cluster
(    
`id`     Int32,    
`pdate`  Date,    
`name`   String 
)ENGINE = ReplicatedMergeTree('/clickhouse/tables/default/{shard}/upsert_tab', '{replica}')
PARTITION BY toYYYYMM(pdate)
PRIMARY KEY id 
ORDER BY id 
SETTINGS index_granularity = 8192;

Upsert data.

Upsert into upsert_tab(id, pdate, name) values (1, rand() % 365, 'abc'), (2, rand() % 365, 'bcd'), (1, rand() % 365, 'def');

Query data in the test_upsert table.

select * from upsert_tab;
┌─id─┬───pdate─┬─name─┐
│  2   │   1970-06-09│  bcd   │
│  1   │   1970-11-30│  def   │
└───┴── ────┴────┘

Upsert for transactions
Similar to other SQL syntax, Upsert also supports explicit and implicit transactions. Before using transactions, you need to enable the transaction function.

Precautions

When creating MergeTree and ReplicatedMergeTree tables, specify the primary key or order by field as the unique key for deduplication. If no primary key is specified and only the order by attribute is specified during table creation, the order by field is used for deduplication.
The key for deduplication must be sharded in advance to ensure that same key fields are in the same shard to ensure deduplication accuracy.

Parent topic: Common ClickHouse SQL Syntax

Previous topic: SHOW: Displaying Information About Databases and Tables

Next topic: Migrating ClickHouse Data