ClickHouse Overview

ClickHouse Table Engine

Table engines play a key role in ClickHouse to determine:

Where to write and read data
Supported query modes
Whether concurrent data access is supported
Whether indexes can be used
Whether multi-thread requests can be executed
Parameters used for data replication

This section describes MergeTree and Distributed engines, which are the most important and frequently used ClickHouse table engines.

For details about other table engines, visit https://clickhouse.tech/docs/en/engines/table-engines.

MergeTree Family
Engines of the MergeTree family are the most universal and functional table engines for high-load tasks. They have the following key features:
- Data is stored by partition and block based on partitioning keys.
- Data index is sorted based on primary keys and the ORDER BY sorting keys.
- Data replication is supported by table engines prefixed with Replicated.
- Data sampling is supported.
When data is written, a table with this type of engine divides data into different folders based on the partitioning key. Each column of data in the folder is an independent file. A file that records serialized index sorting is created. This structure reduces the volume of data to be retrieved during data reading, greatly improving query efficiency.
- MergeTree
  Syntax for creating a table:
```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
    ...
    INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
    INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MergeTree()
ORDER BY expr
[PARTITION BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[TTL expr [DELETE|TO DISK 'xxx'|TO VOLUME 'xxx'], ...]
[SETTINGS name=value, ...]
```
  Example:
```
CREATE TABLE default.test (
  name1 DateTime,
  name2 String,
  name3 String,
  name4 String,
  name5 Date,
  ...
) ENGINE = MergeTree() 
PARTITION BY toYYYYMM(name5)
ORDER BY (name1, name2) 
SETTINGS index_granularity = 8192
```
  Parameters in the example are described as follows:
  - ENGINE = MergeTree(): specifies the MergeTree engine.
  - PARTITION BY toYYYYMM(name4): specifies the partition. The sample data is partitioned by month, and a folder is created for each month.
  - ORDER BY: specifies the sorting field. A multi-field index can be sorted. If the first field is the same, the second field is used for sorting, and so on.
  - index_granularity = 8192: specifies the index granularity. One index value is recorded for every 8,192 data records.
  If the data to be queried exists in a partition or sorting field, the data query time can be greatly reduced.
- ReplacingMergeTree
  Different from MergeTree, ReplacingMergeTree deletes duplicate entries with the same sorting key. ReplacingMergeTree is suitable for clearing duplicate data to save space, but it does not guarantee the absence of duplicate data. Generally, it is not recommended.
  Syntax for creating a table:
```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = ReplacingMergeTree([ver])
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...]
```
- SummingMergeTree
  When merging data parts in SummingMergeTree tables, ClickHouse merges all rows with the same primary key into one row that contains summed values for the columns with the numeric data type. If the primary key is composed in a way that a single key value corresponds to a large number of rows, the storage volume can be significantly reduced and data query can be accelerated.
  
  Syntax for creating a table:
```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = SummingMergeTree([columns])
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...]
```
  Example:
  
  Create a SummingMergeTree table named testTable.
```
CREATE TABLE testTable
(
    id UInt32,
    value UInt32
)
ENGINE = SummingMergeTree()
ORDER BY id
```
  Insert data into the table.
```
INSERT INTO testTable Values(5,9),(5,3),(4,6),(1,2),(2,5),(1,4),(3,8);
INSERT INTO testTable Values(88,5),(5,5),(3,7),(3,5),(1,6),(2,6),(4,7),(4,6),(43,5),(5,9),(3,6);
```
  Query all data in unmerged parts.
```
SELECT * FROM testTable
┌─id─┬─value─┐
│  1   │     6   │
│  2   │     5   │
│  3   │     8   │
│  4   │     6   │
│  5   │    12   │
└───┴──── ┘
┌─id─┬─value─┐
│  1   │       6 │
│  2   │       6 │
│  3   │      18 │
│  4   │      13 │
│  5   │      14 │
│ 43   │       5 │
│ 88   │       5 │
└───┴──── ┘
```
  If ClickHouse has not summed up all rows and you need to aggregate data by ID, use the sum function and GROUP BY statement.
```
SELECT id, sum(value) FROM testTable GROUP BY id
┌─id─┬─sum(value)─┐
│  4   │           19 │
│  3   │           26 │
│ 88   │            5 │
│  2   │           11 │
│  5   │           26 │
│  1   │           12 │
│ 43   │            5 │
└───┴───────┘
```
  Merge rows manually.
```
OPTIMIZE TABLE testTable
```
  Query data in the testTable table again.
```
SELECT * FROM testTable
┌─id─┬─value─┐
│  1   │    12   │
│  2   │    11   │
│  3   │    26   │
│  4   │    19   │
│  5   │    26   │
│ 43   │     5   │
│ 88   │     5   │
└───┴──── ┘
```
  SummingMergeTree uses the ORDER BY sorting keys as the condition keys to aggregate data. That is, if sorting keys are the same, data records are merged into one and the specified merged fields are aggregated.
  
  Data is pre-aggregated only when merging is executed in the background, and the merging execution time cannot be predicted. Therefore, it is possible that some data has been pre-aggregated and some data has not been aggregated. Therefore, the GROUP BY statement must be used during aggregation.
- AggregatingMergeTree
  AggregatingMergeTree is a pre-aggregation engine used to improve aggregation performance. When merging partitions, the AggregatingMergeTree engine aggregates data based on predefined conditions, calculates data based on predefined aggregate functions, and saves the data in binary format to tables.
  
  Syntax for creating a table:
```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]
```
  Example:
  
  You do not need to set the AggregatingMergeTree parameter separately. When partitions are merged, data in each partition is aggregated based on the ORDER BY sorting key. You can set the aggregate functions to be used and column fields to be calculated by defining the AggregateFunction type, as shown in the following example:
```
create table test_table (
name1 String,
name2 String,
name3 AggregateFunction(uniq,String),
name4 AggregateFunction(sum,Int),
name5 DateTime
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(name5)
ORDER BY (name1,name2)
PRIMARY KEY name1;
```
  When data of the AggregateFunction type is written or queried, the *state and *merge functions need to be called. The asterisk (*) indicates the aggregate functions used for defining the field type. For example, the uniq and sum functions are specified for the name3 and name4 fields defined in the test_table, respectively. Therefore, you need to call the uniqState and sumState functions and run the INSERT and SELECT statements when writing data into the table.
```
insert into test_table select '8','test1',uniqState('name1'),sumState(toInt32(100)),'2021-04-30 17:18:00';
insert into test_table select '8','test1',uniqState('name1'),sumState(toInt32(200)),'2021-04-30 17:18:00';
```
  When querying data, you need to call the corresponding functions uniqMerge and sumMerge.
```
select name1,name2,uniqMerge(name3),sumMerge(name4) from test_table group by name1,name2;
┌─name1─┬─name2─┬─uniqMerge(name3)─┬─sumMerge(name4)─┐
│ 8       │   test1 │                  1 │               300 │
└──── ┴──── ┴──────────┴───────── ┘
```
  AggregatingMergeTree is more commonly used with materialized views, which are query views of other data tables at the upper layer. For details, visit https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/aggregatingmergetree/.
- CollapsingMergeTree
  CollapsingMergeTree defines a Sign field to record status of data rows. If Sign is 1, the data in this row is valid. If Sign is -1, the data in this row needs to be deleted.
  Syntax for creating a table:
```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = CollapsingMergeTree(sign)
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...]
```
  Example:
  
  For details about the example, visit https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/collapsingmergetree/.
- VersionedCollapsingMergeTree
  The VersionedCollapsingMergeTree engine adds Version to the table creation statement to record the mapping between a state row and a cancel row in case that rows are out of order. The rows with the same primary key, same Version, and opposite Sign will be deleted during compaction.
  
  Syntax for creating a table:
```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = VersionedCollapsingMergeTree(sign, version)
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...]
```
  Example:
  
  For details about the example, visit https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/versionedcollapsingmergetree/.
- GraphiteMergeTree
  The GraphiteMergeTree engine is used to store data in the time series database Graphite.
  
  Syntax for creating a table:
```
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    Path String,
    Time DateTime,
    Value <Numeric_type>,
    Version <Numeric_type>
    ...
) ENGINE = GraphiteMergeTree(config_section)
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...]
```
  Example:
  
  For details about the example, visit https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/graphitemergetree/.
Replicated*MergeTree Engines
All engines of the MergeTree family in ClickHouse prefixed with Replicated become MergeTree engines that support replicas.

Replicated series engines use ZooKeeper to synchronize data. When a replicated table is created, all replicas of the same shard are synchronized based on the information registered with ZooKeeper.
Template for creating a Replicated engine:
```
ENGINE = Replicated*MergeTree('Storage path in ZooKeeper','Replica name', ...)
```
Two parameters need to be specified for a Replicated engine:
- Storage path in ZooKeeper: specifies the path for storing table data in ZooKeeper. The path format is /clickhouse/tables/{shard}/Database name/Table name.
- Replica name: Generally, {replica} is used.
For details about the example, see Creating a ClickHouse Table.
Distributed Table Engines
The Distributed engine does not store any data. It serves as a transparent proxy for data shards and can automatically transmit data to each node in the cluster. Distributed tables need to work with other local data tables. Distributed tables distribute received read and write tasks to each local table where data is stored.
Figure 1 Distributed

Template for creating a Distributed engine:
```
ENGINE = Distributed(cluster_name, database_name, table_name, [sharding_key])
```
Parameters of a distributed table are described as follows:
- cluster_name: specifies the cluster name. When a distributed table is read or written, the cluster configuration information is used to search for the corresponding ClickHouse instance node.
- database_name: specifies the database name.
- table_name: specifies the name of a local table in the database. It is used to map a distributed table to a local table.
- sharding_key (optional): specifies the sharding key, based on which a distributed table distributes data to each local table.
Example:
```
-- Create a ReplicatedMergeTree local table named test.
CREATE TABLE default.test ON CLUSTER default_cluster_1
(
    `EventDate` DateTime, 
    `id` UInt64
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/test', '{replica}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY id

-- Create a distributed table named test_all based on the local table test.
CREATE TABLE default.test_all ON CLUSTER default_cluster_1
(
    `EventDate` DateTime, 
    `id` UInt64
)
ENGINE = Distributed(default_cluster_1, default, test, rand())
```
Rules for creating a distributed table:
- When creating a distributed table, add ON CLUSTER cluster_name to the table creation statement so that the statement can be executed once on a ClickHouse instance and then distributed to all instances in the cluster for execution.
- Generally, a distributed table is named in the following format: Local table name_all. It forms a one-to-many mapping with local tables. Then, multiple local tables can be operated using the distributed table proxy.
- Ensure that the structure of a distributed table is the same as that of local tables. If they are inconsistent, no error is reported during table creation, but an exception may be reported during data query or insertion.

ClickHouse Data Types

Table 1 describes the data types used in the ClickHouse component of MRS.

For details about the ClickHouse data types, see ClickHouse Data Types.

**Table 1** ClickHouse data types
Category	Keyword	Type	Description
Type	Int8	Int8	Value range: [–128, 127]
	Int16	Int16	Value range: [–32768, 32767]
	Int32	Int32	Value range: [–2147483648, 2147483647]
	Int64	Int64	Value range: [–9223372036854775808, 9223372036854775807]
Floating point	Float32	Float	Similar to the Float type in C, a single-precision floating point number occupies 4 bytes in storage of the machine and is described in 32-bit binary mode.
Floating point	Float64	Double	Similar to the Double type in the C language, a double-precision floating point number occupies eight bytes in the machine and is described in 64-bit binary mode.
Decimal	Decimal	Decimal	A signed fixed-point number that can maintain precision during addition, subtraction, and multiplication operations. The following formats are supported: Decimal(P, S) Decimal32(S) Decimal64(S) Decimal128(S) NOTE: P: precision. Valid range: [1:38]. It determines the number of decimal digits (including fractions) that can be contained. S: scale. The value ranges from 0 to P, which determines the number of decimal places in the decimal part of a number.
String	String	String	The character string can be of any length. It can contain any set of bytes, including empty bytes. Therefore, the string type can replace the VARCHAR, BLOB, and CLOB types in other DBMSs.
String	FixedString	Fixed-length string	When the length of the data happens to be N bytes, the FixedString type is efficient. In other cases, this may reduce efficiency. Examples of values that can be effectively stored in columns of the FixedString type: IP address represented in binary Language code (ru_RU, en_US...) Currency code (RUB...) Hash value represented in binary mode (FixedString (16) for MD5 and FixedString (32) for SHA256)
Date and time	Date	Date	A Date value takes up two bytes, indicating the date value from 1970-01-01 (unsigned) to the current time. Date values are stored without the time zone.
	DateTime	Viewing the timestamp	A Unix timestamp value takes up four bytes (unsigned). Value range of this type is the same as the Date type. The minimum value is 1970-01-01 00:00:00. Timestamp values are accurate to seconds. Leap seconds are not supported. The system time zone will be used when the client or server is started.
	DateTime64	DateTime64	This type allows you to store both the date and time of a specific point in time.
Boolean	Boolean	Boolean	ClickHouse does not support the Boolean type. You can use the UInt8 type for Boolean values. Valid values are 0 and 1.
Array	Array	Array	An Array value is a collection of elements of the same data type. The elements can be of a random data type, even the Array type itself. However, multi-dimensional arrays are not recommended, because ClickHouse supports multi-dimensional arrays only to a limited extent. For example, you cannot store multi-dimensional arrays in MergeTree tables.
Tuple	Tuple	Tuple	A Tuple value is a collection of elements of different data types. Tuple values cannot be stored in tables, except for memory tables. You can use Tuple values to group temporary columns. In queries, you can use IN expressions and lambda functions with specific parameters to group temporary columns.
Domains data type	Domains	Domains	The implementation of the Domains type varies based on different values: If the values are IPv4 addresses, the Domains type is binary compatible with the UInt32 type. Compared with the UInt32 type, the Domains type saves the binary storage space and supports more readable input and output formats.
Enumeration	Enum8	Enum8	Value range: [–128, 127] An Enum value stores the mapping of 'string'= integer, for example, Enum8('hello' = 1, 'world' = 2).
Enumeration	Enum16	Enum16	Value range: [–32768, 32767]
Nullable	Nullable	Nullable	Unless otherwise specified in the ClickHouse server configuration, NULL is the default value for any Nullable type. Nullable values cannot be included in table indexes. Nullable values can be stored together with the normal values of TypeName. For example, columns of the Nullable(Int8) type can store values of the Int8 type, while rows without values store NULL.
Nested type	nested	nested	A nested data structure is similar to a table inside a cell. You can specify the parameters of a nested data structure, such as field name and data type, the same way that you specify parameters in a CREATE TABLE statement. Each row in a CREATE TABLE statement can correspond to a random number of rows in a nested data structure. Example: Nested(Name1 Type1, Name2 Type2, ...)