Updated on 2022-12-02 GMT+08:00

ClickHouse Cluster Configuration

Background

ClickHouse uses the multi-shard and multi-replica deployment architecture to implement the cluster high availability. Multiple shards are defined in each cluster, and each shard has two or more replicas. If a node is faulty, replicas on other nodes in the shard can take over services from the faulty node, ensuring service continuity and improving cluster stability.

This section applies only to MRS 3.1.0.

Cluster Configuration

  1. Log in to Manager and choose Cluster > Services > ClickHouse > Configurations > All Configurations.
  2. Add the custom configuration items in Table 1 to the clickhouse-metrika-customize parameter.

    Table 1 Custom parameters

    Parameter

    Value

    clickhouse_remote_servers.example_cluster.shard[1].replica[1].host

    host1.9bf17e66-e7ed-4f21-9dfc-34575f955ae6.com

    clickhouse_remote_servers.example_cluster.shard[1].replica[1].port

    9000

    clickhouse_remote_servers.example_cluster.shard[1].replica[2].host

    host2.9bf17e66-e7ed-4f21-9dfc-34575f955ae6.com

    clickhouse_remote_servers.example_cluster.shard[1].replica[2].port

    9000

    clickhouse_remote_servers.example_cluster.shard[1].internal_replication

    true

    clickhouse_remote_servers.example_cluster.shard[2].replica[1].host

    host3.9bf17e66-e7ed-4f21-9dfc-34575f955ae6.com

    clickhouse_remote_servers.example_cluster.shard[2].replica[1].port

    9000

    clickhouse_remote_servers.example_cluster.shard[2].replica[2].host

    host4.9bf17e66-e7ed-4f21-9dfc-34575f955ae6.com

    clickhouse_remote_servers.example_cluster.shard[2].replica[2].port

    9000

    clickhouse_remote_servers.example_cluster.shard[2].internal_replication

    true

  3. Click Save.

The following figure shows the cluster architecture:

The following describes the parameters:

  • default_cluster
    • default_cluster indicates the name of the current cluster.
    • The current cluster has two shards. Each shard has two replicas, and each replica corresponds to a ClickHouse instance node.
    • internal_replication indicates whether internal replication is performed between replicas. It takes effect when data is inserted into shards through the cluster.

      The default value is true, indicating that data is written to only one replica. (Data is synchronized between replica through replicated tables to ensure data consistency.)

      If this parameter is set to false (not recommended), same data is written to all replicas of the shard. (Data between replicas is not strongly consistent, and full synchronization cannot be ensured.)

  • macros

    macros indicates IDs the shard and replica where the current instance node resides. It can be used to distinguish different replicas.

    For example, the preceding example shows the configuration of the host3 instance. The shard ID of the instance is 2 and the replica ID is 1.

This section describes how to configure sharding and replication. For details about how to synchronize data between replicas in the ClickHouse cluster, see Replication.

Replication

ClickHouse uses ZooKeeper and the ReplicatedMergeTree engine (of Replicated series) to implement replication. Replication uses a multi-master scheme. The INSERT statement can be sent to any replica, and data is replicated to other replicas in the shard asynchronously.

In the following figure, Node 1 and Node 2 correspond to host1 and host2 in Cluster Configuration.

After the ClickHouse cluster is successfully created, three ZooKeeper nodes are created by default. ZooKeeper stores the metadata of the ClickHouse table during replication.

For details about the ZooKeeper node information, see the config.xml file in the ${BIGDATA_HOME}/FusionInsight_ClickHouse_Version number/x_x_ClickHouse instance name/etc directory.

<yandex>
  ...
  <zookeeper>
    <node index="1">
      <host>node-master1lrgj.9bf17e66-e7ed-4f21-9dfc-34575f955ae6.com</host>
      <port>2181</port>
    </node>
    <node index="2">
      <port>2181</port>
      <host>node-master2vocd.9bf17e66-e7ed-4f21-9dfc-34575f955ae6.com</host>
    </node>
    <node index="3">
      <host>node-master3xwmu.9bf17e66-e7ed-4f21-9dfc-34575f955ae6.com</host>
      <port>2181</port>
    </node>
  </zookeeper>
  ...

For details about how to use the cluster after configuration, see Creating a ClickHouse Table.