ClickHouse Client Practices_Using ClickHouse_Component Operation Guide (LTS)

Scenarios

ClickHouse is a column-based database oriented to online analysis and processing. It supports SQL query and provides good query performance. The aggregation analysis and query performance based on large and wide tables is excellent, which is one order of magnitude faster than other analytical databases.

ClickHouse uses the ReplicatedMergeTree engine and ZooKeeper to replicate tables. When creating a table, you can specify the engine to enable table replication for high availability. Shards and replicas of each table are independent from each other.

ClickHouse supports distributed tables by using the Distributed engine. Views are created on all shards (local tables) to ease distributed queries. ClickHouse supports data sharding, which is one of the features of distributed storage. This allows for parallel read and write for faster queries.

This section describes how to use the cluster client to quickly connect to the ClickHouse service in an MRS cluster.

Prerequisites

The cluster client has been installed, for example, in the /opt/client directory.
For clusters with Kerberos authentication enabled, you have created a user with ClickHouse operation permissions on Manager, for example, clickhouseuser.

Video Tutorial

This video demonstrates how to create a table and insert data into the table using a ClickHouse client after an MRS cluster is created.

The UI may vary depending on the version. This tutorial is for reference only.

Using a ClickHouse Client

Install the cluster client. For details, see Installing a Client.
Log in to the node where the client is installed as the client installation user.
Run the following command to go to the client installation directory:
```
cd /opt/client
```
Run the following command to configure environment variables:
```
source bigdata_env
```

Run the clickhouse client command to connect to the ClickHouse server.

If Kerberos authentication is disabled for the current cluster, run the following command to connect to the ClickHouse server:
```
clickhouse client --host IP address of the ClickHouse instance --port 9000 --user Username --password
```
If Kerberos authentication is enabled for the current cluster, run the following command to connect to the ClickHouse server:
```
kinit Component service user
```
For example:
```
kinit clickhouseuser
```
Connect to the ClickHouse server.
```
clickhouse client --host IP address of the ClickHouse instance --port 9440 --secure
```
The component service user must have the permission to create ClickHouse tables. You can create a user on FusionInsight Manager and associate the user with the required role. For details, see Creating a User with ClickHouse Permissions.

**Table 1** Parameters of the **clickhouse client** command
Parameter	Description
--host	Name of the server node. You can use the host name or IP address of the node where the ClickHouse instance is deployed. To obtain the IP address of the ClickHouseServer instance, log in to FusionInsight Manager of the cluster and choose Cluster > Services > ClickHouse > Instances.
--port	Connection port. Clusters with Kerberos authentication enabled use SSL connections by default. The default port is 9440, and the --secure parameter must be specified. You can also obtain the port number from the tcp_port_secure parameter of the ClickHouseServer instance. If a non-SSL connection is required in this scenario, log in to FusionInsight Manager, choose Cluster > Services > ClickHouse > Configurations, search for SSL_NONESSL_BOTH_ENABLE, change the value to true, and restart all ClickHouse instances. Clusters with Kerberos authentication disabled use non-SSL connections by default. The default port is 9000, and the --secure parameter is not required. You can also obtain the port number from the tcp_port parameter of the ClickHouseServer instance. If an SSL connection is required in this scenario, log in to FusionInsight Manager, choose Cluster > Services > ClickHouse > Configurations, search for SSL_NONESSL_BOTH_ENABLE, change the value to true, and restart all ClickHouse instances. NOTE: The preceding ports are open-source ports. If a custom port policy is used during cluster creation, replace the ports by referring to Common Troubleshooting. You can run the clickhouse -h command to view the command help of the ClickHouse component.
--user	Connection username. If Kerberos authentication has been enabled for the cluster (the cluster is in security mode) and you have passed the kinit authentication, the --user and --password parameters are optional for client login. You must create a user with this name on Manager because there is no default user for Kerberos authentication. If Kerberos authentication is disabled for the current cluster (the cluster is in normal mode) and you need to enter the username and password to log in to the client, you can use the user default (the password of user default is empty) or a user with ClickHouse permissions. MRS 3.3.0-LTS and later: If Kerberos authentication is disabled for the cluster (in normal mode), you can create a user by referring to Creating a User with ClickHouse Permissions. MRS 3.2.0-LTS and earlier: If Kerberos authentication is disabled for the cluster (in normal mode), do not use the ClickHouse user created on FusionInsight Manager. You need to execute the create user SQL statement on the client to create a ClickHouse user. MRS 3.2.0-LTS: If Kerberos authentication is disabled for the cluster (in normal mode), you are not advised to use user default.
--password	User password for connection. This parameter is used together with the --user parameter.
--query	Query to process when using non-interactive mode.
--database	The current database is used by default. This parameter uses the default configuration on the server.
--multiline	If this parameter is specified, multiline queries are allowed. (Enter only indicates line feed and does not indicate that the query statement is complete.)
--multiquery	If this parameter is specified, multiple queries separated with semicolons (;) can be processed. This parameter is valid only in non-interactive mode.
--format	Specified default format used to output the result.
--vertical	If this parameter is specified, the result is output in vertical format by default. In this format, each value is printed on a separate line, which helps to display a wide table.
--time	If this parameter is specified, the query execution time is printed to stderr in non-interactive mode.
--stacktrace	If this parameter is specified, stack trace information will be printed when an exception occurs.
--config-file	Name of the configuration file.
--secure	If this parameter is specified, the server will be connected in SSL mode.
--history_file	Path of files that record command history.
--param_<name>	Query with parameters. Pass values from the client to the server. For details, see https://clickhouse.tech/docs/en/interfaces/cli/#cli-queries-with-parameters.

Run the quit; command to exit the ClickHouse server connection.

Viewing Environment Parameters (cluster) of ClickHouse

Use the ClickHouse client to connect to the ClickHouse server.

Query the cluster identifier and other information about the environment parameters.

select cluster,shard_num,replica_num,host_name from system.clusters;

The result is as follows:

SELECT 
    cluster, 
    shard_num, 
    replica_num, 
    host_name
FROM system.clusters

┌─cluster───────────┬─shard_num─┬─replica_num─┬─host_name──────── ┐
│ default_cluster_1             │         1   │           1   │ node-master1dOnG           │
│ default_cluster_1             │         1   │           2   │ node-group-1tXED0001       │
│ default_cluster_1             │         2   │           1   │ node-master2OXQS           │
│ default_cluster_1             │         2   │           2   │ node-group-1tXED0002       │
│ default_cluster_1             │         3   │           1   │ node-master3QsRI           │
│ default_cluster_1             │         3   │           2   │ node-group-1tXED0003       │
└─────────────── ┴────── ┴─────── ┴──────────────┘

6 rows in set. Elapsed: 0.001 sec.

Query the shard and replica identifiers.

select * from system.macros;

The result is as follows:

SELECT *
FROM system.macros

┌─macro───┬─substitution─────┐
│ id          │ 76                     │
│ replica     │ 2                      │
│ shard       │ 3                      │
└────── ┴────────────┘

3 rows in set. Elapsed: 0.001 sec.

Creating a Local Replicated Table and a Distributed Table

Create a replicated table using the ReplicatedMergeTree engine.

For details about the syntax, see https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/replication/#creating-replicated-tables.

For example, to create the ReplicatedMergeTree table test in the default database on all default_cluster_1 nodes, run the following command:

CREATE TABLE default.test ON CLUSTER default_cluster_1
(
`EventDate` DateTime,
`id` UInt64
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/test', '{replica}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY id;

**Table 2** Parameter description
Parameter	Description
ON CLUSTER	Distributed DDL, which means that the same local table can be created on all instances in the cluster after the statement is executed once.
default_cluster_1	Cluster identifier obtained from the environment parameters of the ClickHouse service.

ReplicatedMergeTree engine receives the following two parameters:

Storage path of the table data in ZooKeeper
The path must be in the /clickhouse directory. Otherwise, data insertion may fail due to insufficient ZooKeeper quota.

To avoid data conflict between different tables in ZooKeeper, the directory must be in the following format:

/clickhouse/tables/{shard}/default/test, in which /clickhouse/tables/{shard} is fixed, default indicates the database name, and text indicates the name of the created table.
Replica name: Generally, {replica} is used.

CREATE TABLE default.test ON CLUSTER default_cluster_1
(
    `EventDate` DateTime, 
    `id` UInt64
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/test', '{replica}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY id

┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ node-group-1tXED0002                   │  9000  │      0   │         │                   5   │                3   │
│ node-group-1tXED0003                   │  9000  │      0   │         │                   4   │                3   │
│ node-master1dOnG                       │  9000  │      0   │         │                   3   │                3   │
└────────────────────┴────┴─────┴──── ┴─────────── ┴──────────┘
┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ node-master3QsRI                       │  9000  │      0   │         │                   2   │                0   │
│ node-group-1tXED0001                   │  9000  │      0   │         │                   1   │                0   │
│ node-master2OXQS                       │  9000  │      0   │         │                   0   │                0   │
└────────────────────┴────┴─────┴──── ┴─────────── ┴──────────┘

6 rows in set. Elapsed: 0.189 sec.

Create a distributed table using the Distributed engine.

For example, run the following statement to create the test_all table in the default database on all default_cluster_1 nodes:

CREATE TABLE default.test_all ON CLUSTER default_cluster_1
(
`EventDate` DateTime,
`id` UInt64
)
ENGINE = Distributed(default_cluster_1, default, test, rand());

**Table 3** Distributed engine parameters
Parameter	Description
default_cluster_1	Cluster identifier obtained from the environment parameters of the ClickHouse service
default	Name of the database where the local table is located
test	Name of the local table
rand	(Optional) Sharding key This key and the weight configured in the config.xml file determine the route for writing data to the distributed table, that is, the physical table to which the data is written. It can be the original data (for example, site_id) of a column in the table or the result of the function call, for example, rand() is used in the preceding SQL statement. Note that data must be evenly distributed in this key. Another common operation is to use the hash value of a column with a large difference, for example, intHash64(user_id).

The result is as follows:

┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ node-group-1tXED0002                   │  9000  │      0   │         │                   5   │                0   │
│ node-master3QsRI                       │  9000  │      0   │         │                   4   │                0   │
│ node-group-1tXED0003                   │  9000  │      0   │         │                   3   │                0   │
│ node-group-1tXED0001                   │  9000  │      0   │         │                   2   │                0   │
│ node-master1dOnG                       │  9000  │      0   │         │                   1   │                0   │
│ node-master2OXQS                       │  9000  │      0   │         │                   0   │                0   │
└────────────────────┴────┴─────┴──── ┴─────────── ┴──────────┘

6 rows in set. Elapsed: 0.115 sec.

ClickHouse Table Data Operations

Insert data to the local table.

For example, insert data to the test table.
```
insert into test values(toDateTime(now()), rand());
```

Query local table information.

For example, query data in the test table.

select * from test;

The result is as follows:

SELECT *
FROM test

┌───────────EventDate─┬─────────id─┐
│ 2020-11-05 21:10:42             │ 1596238076           │
└──────────────── ┴───────────┘

1 rows in set. Elapsed: 0.002 sec.

Query the distributed table.

The following example queries the distributed table test_all, which is created based on test, and gets the same result as querying the test table.

select * from test_all;

The result is as follows:

SELECT *
FROM test_all

┌───────────EventDate─┬─────────id─┐
│ 2020-11-05 21:10:42             │ 1596238076           │
└──────────────── ┴───────────┘

1 rows in set. Elapsed: 0.004 sec.

Switch to the shard node with the same shard_num and query the information about the current table. The same table data can be queried.

For example, run the exit; command to exit the original node.

Run the following command to switch to the node-group-1tXED0003 node:
```
clickhouse client --host node-group-1tXED0003 --multiline --port 9440  --secure;
```
Query information about the current table.
```
show tables;
```
The result is as follows:
```
SHOW TABLES

┌─name─────┐
│ test           │
│ test_all       │
└────────┘
 
```

Query local table data. For example, query data in the test table on the node-group-1tXED0003 node.

select * from test;

The result is as follows:

SELECT *
FROM test

┌───────────EventDate─┬─────────id─┐
│ 2020-11-05 21:10:42             │ 1596238076           │
└──────────────── ┴───────────┘

1 rows in set. Elapsed: 0.005 sec.

Switch to the shard node with different shard_num value and query the data of the created table.

For example, exit the node-group-1tXED0003 node.

exit;

Switch to the node-group-1tXED0001 node.

clickhouse client --host node-group-1tXED0001 --multiline --port 9440  --secure;

Query the local table test. The test table data cannot be queried on the different shard nodes.

select * from test;

The result is as follows:

SELECT *
FROM test

Ok.

Query data in the distributed table test_all. The data can be queried.

select * from test_all;

The result is as follows:

SELECT *
FROM test

┌───────────EventDate─┬─────────id─┐
│ 2020-11-05 21:12:19             │ 3686805070           │
└──────────────── ┴───────────┘

1 rows in set. Elapsed: 0.002 sec.

Common Troubleshooting

After the command for connecting to the ClickHouse client is executed, error message "Connection refused" is displayed.

Check whether the cluster uses custom ports (Component Port is set to Custom during cluster creation). If custom ports are used, replace the ports used in the command for connecting to the ClickHouse component client with the default custom ports in the following table.

Parameter	Default Open Source Port	Default Custom Port	Port Description
interserver_http_port	9009	9009	HTTP port for the communication between ClickHouse servers.
interserver_https_port	9010	9010	HTTPS port for the communication between ClickHouse servers.
http_port	8123	8123	Port for connecting to the ClickHouse server through HTTP.
https_port	8443	8443	Port for connecting to the ClickHouse server through HTTPS.
tcp_port	9000	9000	Port for connecting the client to the ClickHouse server through TCP.
tcp_port_secure	9440	9440	Port for connecting the client to the ClickHouse server through TCP SSL.
lb_tcp_port	21424	21424	TCP port listened by ClickHouseBalancer
lb_http_port	21425	21425	HTTP port listened by ClickHouseBalancer
lb_https_port	21426	21426	HTTPS port listened by ClickHouseBalancer
lb_tcp_secure_port	21428	21428	TCP SSL port listened by ClickHouseBalancer

ClickHouse Client Practices

Scenarios

Prerequisites

Video Tutorial

Using a ClickHouse Client

Viewing Environment Parameters (cluster) of ClickHouse

Creating a Local Replicated Table and a Distributed Table

ClickHouse Table Data Operations

Common Troubleshooting

Feedback

Was this page helpful?