Interconnecting ClickHouse with HDFS (MRS 3.2.0-LTS)

This topic is available for MRS 3.2.0-LTS version only.
Scenario
Connect ClickHouse to HDFS to read and write files.
Prerequisites
- The ClickHouse client has been installed in a directory, for example, /opt/client.
- A user, for example, clickhouseuser, who has permissions on ClickHouse tables and has the permission to access HDFS has been created on FusionInsight Manager.
- A corresponding directory exists in HDFS. The HDFS engine of ClickHouse only works with files but does not create or delete directories.
- Only the ClickHouse cluster deployed on x86 nodes can connect to HDFS. The ClickHouse cluster deployed on Arm nodes cannot connect to HDFS.
Procedure
- Log in to the node where the client is installed as the client installation user.
- Run the following command to go to the client installation directory:
cd /opt/client
- Run the following command to configure environment variables:
source bigdata_env
- Run the following command to authenticate the current user. (Change the password upon the first authentication. Skip this step for a cluster with Kerberos authentication disabled.)
kinit clickhouseuser
- Run the client command of ClickHouse to log in to the ClickHouse client.
clickhouse client --host Service IP address of the ClickHouseServer instance --secure --port 9440
- Run the following command to connect ClickHouse to HDFS:
CREATE TABLE default.hdfs_engine_table (`name` String, `value` UInt32) ENGINE = HDFS('hdfs://{namenode_ip}:{dfs.namenode.rpc.port}/tmp/secure_ck.txt', 'TSV')
- To obtain the service IP address of the ClickHouseServer instance, perform the following steps:
Log in to FusionInsight Manager and choose Cluster > Services > ClickHouse. On the page that is displayed, click the Instances tab. In this tab, obtain the service IP addresses of the ClickHouseServer instance.
- To obtain the value of namenode_ip, perform the following steps:
Log in to FusionInsight Manager and choose Cluster > Services > HDFS. On the page that is displayed, click the Instances tab. In this tab, obtain the service IP addresses of the active NameNode.
- To obtain the value of dfs.namenode.rpc.port, perform the following steps:
Log in to FusionInsight Manager and choose Cluster > Services > HDFS. On the page that is displayed, click the Configurations tab then the All Configurations sub-tab. In this sub-tab, search for dfs.namenode.rpc.port to obtain its value.
- HDFS file path to be accessed:
If multiple files need to be accessed, add an asterisk (*) to the end of the folder, for example, hdfs://{namenode_ip}:{dfs.namenode.rpc.port}/tmp/*.
- To obtain the service IP address of the ClickHouseServer instance, perform the following steps:
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.