Batch Importing Local Data Files into a ClickHouse Cluster
Scenarios
If you need to import a large number of data files into a ClickHouse cluster, you can use the multi-threaded import tool to speed up the process.
The multi-threaded import tool handles multiple tasks simultaneously, greatly improving data import speed and overall efficiency.
- Improved processing speed: Multi-threading maximizes the computing power of multi-core CPUs by processing multiple data files in parallel, significantly reducing overall import time.
- Efficient resource utilization: Parallel processing makes better use of system resources, improving overall performance.
- Improved import efficiency: The multi-threaded import tool effectively manages concurrent tasks, enabling a smoother and more efficient data import process.
Notes and Constraints
This section applies only to MRS 3.3.0-LTS or later.
Prerequisites
- The ClickHouse client has been installed in a directory, for example, /opt/client.
- For a cluster in security mode, a user with ClickHouse permissions has been created, for example, clickhouseuser. For details, see Creating a User with ClickHouse Permissions.
- The data file to be imported has been uploaded to a client node directory, for example, /opt/data.
For details about all data types supported by ClickHouse, visit https://clickhouse.com/docs/en/interfaces/formats.
Procedure
- Log in to the node where the client is installed as the client installation user.
- Go to the directory where the multi-thread write tool clickhouse_insert_tool is deployed.
cd /opt/client/ClickHouse/clickhouse_insert_tool - Use the text editor to open clickhouse_insert_tool.sh and enter required information based on the comments.
Parameter
Description
Example
datapath
Directory containing the data to be imported
/opt/data
balancer_ip_list
IP addresses of the ClickHouse Balancer instances. The IP addresses must be enclosed in parentheses. A single IP address must be enclosed in double quotation marks, and IP addresses must be separated by spaces.
("192.168.1.1" "192.168.1.2")
balancer_tcp_port
TCP port for the Balancer instance of the ClickHouse service
21428
local_table_name
Names of the local library and table to be imported
testdb1.testtb1
thread_num
Number of concurrent threads for importing data
10
data_format
Format of the data to be imported
CSV
is_security_cluster
Whether the cluster is in security mode.
- true indicates that the security mode.
- false indicates the normal mode.
true
- Save the modified clickhouse_insert_tool.sh file and run the following commands:
Navigate to the directory where the client is installed.
cd /opt/clientConfigure environment variables.
source bigdata_env
In security mode (Kerberos authentication is enabled), run the kinit command. In normal mode (Kerberos authentication is disabled), you do not need to run the following command:
kinit clickhouseuser
- Run the script to import data.
./ClickHouse/clickhouse_insert_tool/clickhouse_insert_tool.sh
- Log in to the ClickHouse client node and connect the server. For details, see ClickHouse Client Practices.
- Run the following command to query the distributed table corresponding to the local table where data is inserted and check the result:
select count(1) from testdb1.testtb1_all;
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.