ClickHouse Client Practices
ClickHouse is a column-based database oriented to online analysis and processing. It supports SQL query and provides good query performance. The aggregation analysis and query performance based on large and wide tables is excellent, which is one order of magnitude faster than other analytical databases.
ClickHouse uses the ReplicatedMergeTree engine and ZooKeeper to replicate tables. When creating a table, you can specify the engine to enable table replication for high availability. Shards and replicas of each table are independent from each other.
ClickHouse supports distributed tables by using the Distributed engine. Views are created on all shards (local tables) to ease distributed queries. ClickHouse supports data sharding, which is one of the features of distributed storage. This allows for parallel read and write for faster queries.
This topic uses the cluster client to quickly connect to the ClickHouse service in a cluster.
Prerequisites
- The cluster client has been installed, for example, in the /opt/client directory.
- For clusters with Kerberos authentication enabled, you have created a user with ClickHouse operation permissions on Manager, for example, clickhouseuser.
Using a ClickHouse Client
- Install the cluster client. For details, see Installing a Client.
- Log in to the node where the client is installed as the client installation user.
- Run the following command to go to the client installation directory:
cd /opt/client
- Run the following command to configure environment variables:
source bigdata_env
- If Kerberos authentication has been enabled for the cluster, authenticate the user first. If not, skip this step.
The service user must have the permission to create ClickHouse tables. For details, see Creating a ClickHouse Role. Create a user on Manager and associate the user to the required role.
kinit Component service user
The following provides an example:
kinit clickhouseuser
- Run the clickhouse client command to connect to the ClickHouse server.
- Using a non-SSL method for login when Kerberos authentication is disabled for the ClickHouse cluster
clickhouse client --host IP address of the ClickHouse instance --port 9000 --user Username --password
- Using SSL for login when Kerberos authentication is enabled for the current cluster:
clickhouse client --host IP address of the ClickHouse instance --port 9440 --user Username --password
Table 1 Parameters of the clickhouse client command Parameter
Description
--host
Name of the server node. You can use the host name or IP address of the node where the ClickHouse instance is deployed.
To obtain the IP address of the ClickHouseServer instance, log in to FusionInsight Manager of the cluster and choose Cluster > Services > ClickHouse > Instances.
--port
Connection port.
- Clusters with Kerberos authentication enabled use SSL connections by default. The default port is 9440, and the --secure parameter must be specified.
You can also obtain the port number from the tcp_port_secure parameter of the ClickHouseServer instance.
If a non-SSL connection is required in this scenario, log in to FusionInsight Manager, choose Cluster > Services > ClickHouse > Configurations, search for SSL_NONESSL_BOTH_ENABLE, change the value to true, and restart all ClickHouse instances.
- Clusters with Kerberos authentication disabled use non-SSL connections by default. The default port is 9000, and the --secure parameter is not required.
You can also obtain the port number from the tcp_port parameter of the ClickHouseServer instance.
If an SSL connection is required in this scenario, log in to FusionInsight Manager, choose Cluster > Services > ClickHouse > Configurations, search for SSL_NONESSL_BOTH_ENABLE, change the value to true, and restart all ClickHouse instances.
NOTE:The preceding ports are open-source ports. If a custom port policy is used during cluster creation, replace the ports by referring to Common Troubleshooting.
You can run the clickhouse -h command to view the command help of the ClickHouse component.
--user
Connection username.
- If Kerberos authentication has been enabled for the cluster (the cluster is in security mode) and you have passed the kinit authentication, the --user and --password parameters are optional for client login. You must create a user with this name on Manager because there is no default user for Kerberos authentication.
- If Kerberos authentication is disabled for the cluster (the cluster is in normal mode), do not use the ClickHouse user created on FusionInsight Manager to log in to the client. You need to execute the create user SQL statement on the client to create a ClickHouse user.
Alternatively, you can use the default user for login.
--password
User password for connection. This parameter is used together with the --user parameter.
--query
Query to process when using non-interactive mode.
--database
The current database is used by default.
This parameter uses the default configuration on the server.
--multiline
If this parameter is specified, multiline queries are allowed. (Enter only indicates line feed and does not indicate that the query statement is complete.)
--multiquery
If this parameter is specified, multiple queries separated with semicolons (;) can be processed. This parameter is valid only in non-interactive mode.
--format
Specified default format used to output the result.
--vertical
If this parameter is specified, the result is output in vertical format by default. In this format, each value is printed on a separate line, which helps to display a wide table.
--time
If this parameter is specified, the query execution time is printed to stderr in non-interactive mode.
--stacktrace
If this parameter is specified, stack trace information will be printed when an exception occurs.
--config-file
Name of the configuration file.
--secure
If this parameter is specified, the server will be connected in SSL mode.
--history_file
Path of files that record command history.
--param_<name>
Query with parameters. Pass values from the client to the server.
For details, see https://clickhouse.tech/docs/en/interfaces/cli/#cli-queries-with-parameters.
- Using a non-SSL method for login when Kerberos authentication is disabled for the ClickHouse cluster
- Run the quit; command to exit the ClickHouse server connection.
Viewing Environment Parameters (cluster) of ClickHouse
- Use the ClickHouse client to connect to the ClickHouse server.
- Query the cluster identifier and other information about the environment parameters.
select cluster,shard_num,replica_num,host_name from system.clusters;
SELECT cluster, shard_num, replica_num, host_name FROM system.clusters ┌─cluster───────────┬─shard_num─┬─replica_num─┬─host_name──────── ┐ │ default_cluster_1 │ 1 │ 1 │ node-master1dOnG │ │ default_cluster_1 │ 1 │ 2 │ node-group-1tXED0001 │ │ default_cluster_1 │ 2 │ 1 │ node-master2OXQS │ │ default_cluster_1 │ 2 │ 2 │ node-group-1tXED0002 │ │ default_cluster_1 │ 3 │ 1 │ node-master3QsRI │ │ default_cluster_1 │ 3 │ 2 │ node-group-1tXED0003 │ └─────────────── ┴────── ┴─────── ┴──────────────┘ 6 rows in set. Elapsed: 0.001 sec.
- Query the shard and replica identifiers.
select * from system.macros;
SELECT * FROM system.macros ┌─macro───┬─substitution─────┐ │ id │ 76 │ │ replica │ 2 │ │ shard │ 3 │ └────── ┴────────────┘ 3 rows in set. Elapsed: 0.001 sec.
Creating a Local Replicated Table and a Distributed Table
- Create a replicated table using the ReplicatedMergeTree engine.
For details about the syntax, see https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/replication/#creating-replicated-tables.
For example, to create the ReplicatedMergeTree table test in the default database on all default_cluster_1 nodes, run the following command:
CREATE TABLE default.test ON CLUSTER default_cluster_1
(
`EventDate` DateTime,
`id` UInt64
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/test', '{replica}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY id;
- ON CLUSTER is a distributed DDL clause that creates the same local table on all instances in the cluster in one execution.
- default_cluster_1 is the cluster identifier used for viewing environment parameters of the ClickHouse service.
ReplicatedMergeTree engine receives the following two parameters:
- Storage path of the table data in ZooKeeper
The path must be in the /clickhouse directory. Otherwise, data insertion may fail due to insufficient ZooKeeper quota.
To avoid data conflict between different tables in ZooKeeper, the directory must be in the following format:
/clickhouse/tables/{shard}/default/test, in which /clickhouse/tables/{shard} is fixed, default indicates the database name, and text indicates the name of the created table.
- Replica name: Generally, {replica} is used.
- Storage path of the table data in ZooKeeper
CREATE TABLE default.test ON CLUSTER default_cluster_1 ( `EventDate` DateTime, `id` UInt64 ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/test', '{replica}') PARTITION BY toYYYYMM(EventDate) ORDER BY id ┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ │ node-group-1tXED0002 │ 9000 │ 0 │ │ 5 │ 3 │ │ node-group-1tXED0003 │ 9000 │ 0 │ │ 4 │ 3 │ │ node-master1dOnG │ 9000 │ 0 │ │ 3 │ 3 │ └────────────────────┴────┴─────┴──── ┴─────────── ┴──────────┘ ┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ │ node-master3QsRI │ 9000 │ 0 │ │ 2 │ 0 │ │ node-group-1tXED0001 │ 9000 │ 0 │ │ 1 │ 0 │ │ node-master2OXQS │ 9000 │ 0 │ │ 0 │ 0 │ └────────────────────┴────┴─────┴──── ┴─────────── ┴──────────┘ 6 rows in set. Elapsed: 0.189 sec.
- Create a distributed table using the Distributed engine.
For example, run the following statement to create the test_all table in the default database on all default_cluster_1 nodes:
CREATE TABLE default.test_all ON CLUSTER default_cluster_1
(
`EventDate` DateTime,
`id` UInt64
)
ENGINE = Distributed(default_cluster_1, default, test, rand());
CREATE TABLE default.test_all ON CLUSTER default_cluster_1 ( `EventDate` DateTime, `id` UInt64 ) ENGINE = Distributed(default_cluster_1, default, test, rand()) ┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐ │ node-group-1tXED0002 │ 9000 │ 0 │ │ 5 │ 0 │ │ node-master3QsRI │ 9000 │ 0 │ │ 4 │ 0 │ │ node-group-1tXED0003 │ 9000 │ 0 │ │ 3 │ 0 │ │ node-group-1tXED0001 │ 9000 │ 0 │ │ 2 │ 0 │ │ node-master1dOnG │ 9000 │ 0 │ │ 1 │ 0 │ │ node-master2OXQS │ 9000 │ 0 │ │ 0 │ 0 │ └────────────────────┴────┴─────┴──── ┴─────────── ┴──────────┘ 6 rows in set. Elapsed: 0.115 sec.
Distributed requires the following parameters:
- default_cluster_1 is the cluster identifier used for viewing environment parameters of the ClickHouse service.
- default indicates the name of the database where the local table is located.
- test is the name of the local table.
- (Optional) Sharding key
This key and the weight configured in the config.xml file determine the route for writing data to the distributed table, that is, the physical table to which the data is written. It can be the original data (for example, site_id) of a column in the table or the result of the function call, for example, rand() is used in the preceding SQL statement. Note that data must be evenly distributed in this key. Another common operation is to use the hash value of a column with a large difference, for example, intHash64(user_id).
ClickHouse Table Data Operations
- Insert data to the local table.
For example, insert data to the test table.
insert into test values(toDateTime(now()), rand());
- Query local table information.
For example, query data in the test table.
select * from test;
SELECT * FROM test ┌───────────EventDate─┬─────────id─┐ │ 2020-11-05 21:10:42 │ 1596238076 │ └──────────────── ┴───────────┘ 1 rows in set. Elapsed: 0.002 sec.
- Query the distributed table.
The following example queries the distributed tabletest_all, which is created based on test, and gets the same result as querying the test table.
select * from test_all;
SELECT * FROM test_all ┌───────────EventDate─┬─────────id─┐ │ 2020-11-05 21:10:42 │ 1596238076 │ └──────────────── ┴───────────┘ 1 rows in set. Elapsed: 0.004 sec.
- Switch to the shard node with the same shard_num and query the information about the current table. The same table data can be queried.
For example, run the exit; command to exit the original node.
Run the following command to switch to the node-group-1tXED0003 node:
clickhouse client --host node-group-1tXED0003 --multiline --port 9440 --secure;
show tables;
SHOW TABLES ┌─name─────┐ │ test │ │ test_all │ └────────┘
- Query local table data. For example, query data in the test table on the node-group-1tXED0003 node.
select * from test;
SELECT * FROM test ┌───────────EventDate─┬─────────id─┐ │ 2020-11-05 21:10:42 │ 1596238076 │ └──────────────── ┴───────────┘ 1 rows in set. Elapsed: 0.005 sec.
- Switch to the shard node with different shard_num value and query the data of the created table.
For example, exit the node-group-1tXED0003 node.
exit;
Switch to the node-group-1tXED0001 node.
clickhouse client --host node-group-1tXED0001 --multiline --port 9440 --secure;
Query the local table test. The test table data cannot be queried on the different shard nodes.
select * from test;
SELECT * FROM test Ok.
Query data in the distributed table test_all. The data can be queried.
select * from test_all;
SELECT * FROM test ┌───────────EventDate─┬─────────id─┐ │ 2020-11-05 21:12:19 │ 3686805070 │ └──────────────── ┴───────────┘ 1 rows in set. Elapsed: 0.002 sec.
Common Troubleshooting
After the command for connecting to the ClickHouse client is executed, error message "Connection refused" is displayed.
Check whether the cluster uses custom ports (Component Port is set to Custom during cluster creation). If custom ports are used, replace the ports used in the command for connecting to the ClickHouse component client with the default custom ports in the following table.
Parameter |
Default Open Source Port |
Default Custom Port |
Port Description |
---|---|---|---|
interserver_http_port |
9009 |
9009 |
HTTP port for the communication between ClickHouse servers. |
interserver_https_port |
9010 |
9010 |
HTTPS port for the communication between ClickHouse servers. |
http_port |
8123 |
8123 |
Port for connecting to the ClickHouse server through HTTP. |
https_port |
8443 |
8443 |
Port for connecting to the ClickHouse server through HTTPS. |
tcp_port |
9000 |
9000 |
Port for connecting the client to the ClickHouse server through TCP. |
tcp_port_secure |
9440 |
9440 |
Port for connecting the client to the ClickHouse server through TCP SSL. |
lb_tcp_port |
21424 |
21424 |
TCP port listened by ClickHouseBalancer |
lb_http_port |
21425 |
21425 |
HTTP port listened by ClickHouseBalancer |
lb_https_port |
21426 |
21426 |
HTTPS port listened by ClickHouseBalancer |
lb_tcp_secure_port |
21428 |
21428 |
TCP SSL port listened by ClickHouseBalancer |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot