Updated on 2024-05-29 GMT+08:00

Configuring a ClickHouse Data Source

Scenario

In the ClickHouse data source, tables with the same name but in different cases, for example, cktable (lowercase), CKTABLE (uppercase), and CKtable (uppercase and lowercase), cannot co-exist in the same schema or database. Otherwise, tables in the schema or database cannot be used by HetuEngine.

Prerequisites

You have created a HetuEngine administrator by referring to Creating a HetuEngine User.

Procedure

  1. Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HetuEngine. The HetuEngine service page is displayed.
  2. In the Basic Information area on the Dashboard tab page, click the link next to HSConsole WebUI. The HSConsole page is displayed.
  3. Choose Data Source and lick Add Data Source. On the Add Data Source page that is displayed, configure parameters.

    1. In the Basic Configuration area, configure Name and choose JDBC > ClickHouse for Data Source Type.
    2. Configure parameters in the ClickHouse Configuration area. For details, see Table 1.
      Table 1 ClickHouse Configuration

      Parameter

      Description

      Example Value

      Driver

      The default value is clickhouse.

      clickhouse

      JDBC URL

      JDBC URL of the ClickHouse data source.

      • If the ClickHouse data source uses IPv4, the format is jdbc:clickhouse://<host>:<port>.
      • If the ClickHouse data source uses IPv6, the format is jdbc:clickhouse://[<host>]:<port>.
      • To obtain the value of <host>, log in to Manager of the cluster where the ClickHouse data source is located, choose Cluster > Services > ClickHouse > Instance, and view the ClickHouseBalancer service IP address. Select an IP address randomly. Currently, only one IP address can be entered.
      • To obtain the value of <port> in MRS 3.2.0 or later, log in to Manager of the cluster where the ClickHouse data source is located, click Cluster, choose Services > ClickHouse, click Configurations, and click All Configurations. If the ClickHouse data source is in security mode, check the HTTPS port number of the ClickHouseBalancer instance, that is, the value of lb_https_port. If the ClickHouse data source is in normal mode, check the HTTP port number of the ClickHouseBalancer instance, that is, the value of lb_http_port.
      • To obtain the value of <port> in MRS 3.2.0 or later, log in to FusionInsight Manager, click Cluster, choose Services > ClickHouse, and click Logic Cluster. On the displayed page, view the HTTP Balancer port number of the logical cluster.

      jdbc:clickhouse://10.162.156.243:21426 or jdbc:clickhouse://10.162.156.243:21425

      Username

      Username used for connecting to the ClickHouse data source.

      Change the value based on the username being connected with the data source.

      Password

      User password used for connecting to the ClickHouse data source.

      Change the value based on the user password for connecting to the data source.

      Case-sensitive Table/Schema Name

      Whether to support case-sensitive names or schemas of the data source.

      HetuEngine supports case-sensitive names or schemas of the data source.

      • No: If multiple table names exist in the same schema of a data source, for example, cktable (lowercase), CKTABLE (uppercase), and CKtable (lowercase and uppercase), only cktable (lowercase) can be used by HetuEngine.
      • Yes: Only one table name can exist in the same schema of the data source, for example, cktable (lowercase), CKTABLE (uppercase), or CKtable (lowercase and uppercase). Otherwise, all tables in the schema cannot be used by HetuEngine.

      -

    3. (Optional) Customize the configuration.
      You can click Add to add custom configuration parameters. Configure custom parameters of the ClickHouse data source. For details, see Table 2.
      Table 2 Custom parameters of the ClickHouse data source

      Parameter

      Description

      Example Value

      use-connection-pool

      Whether to use the JDBC connection pool.

      true

      jdbc.connection.pool.maxTotal

      Maximum number of connections in the JDBC connection pool.

      8

      jdbc.connection.pool.maxIdle

      Maximum number of idle connections in the JDBC connection pool.

      8

      jdbc.connection.pool.minIdle

      Minimum number of idle connections in the JDBC connection pool.

      0

      jdbc.connection.pool.testOnBorrow

      Whether to check the connection validity when using a connection obtained from the JDBC connection pool.

      false

      jdbc.pushdown-enabled

      Whether to enable the pushdown function.

      Default value: true

      true

      jdbc.pushdown-module

      Pushdown type.

      • DEFAULT: No operator is pushed down.
      • BASE_PUSHDOWN: Only operators such as Filter, Aggregation, Limit, TopN, and Projection are pushed down.
      • FULL_PUSHDOWN: All supported operators are pushed down.

      -

      clickhouse.map-string-as-varchar

      Whether to convert the ClickHouse data source of the String and FixedString types to the Varchar type.

      Default value: true

      true

      clickhouse.socket-timeout

      Timeout interval for connecting to the ClickHouse data source.

      Unit: millisecond

      Default value: 120000

      120000

      case-insensitive-name-matching.cache-ttl

      Timeout interval for caching case-sensitive names of schemas or tables of the data sources.

      Unit: minute

      Default value: 1

      1

      You can click Delete to delete custom configuration parameters.

    1. Click OK.

  4. Log in to the node where the cluster client is located and run the following commands to switch to the client installation directory and authenticate the user:

    cd /opt/client

    source bigdata_env

    kinit User performing HetuEngine operations (If the cluster is in normal mode, skip this step.)

  5. Run the following command to log in to the catalog of the data source:

    hetu-cli --catalog Data source name --schema Database name

    For example, run the following command:

    hetu-cli --catalog clickhouse_1 --schema default

  6. Run the following command. If the database table information can be viewed or no error is reported, the connection is successful.

    show tables;

Data Type Mapping

Mapping from ClickHouse data types to HetuEngine data types

ClickHouse Data Type

HetuEngine Data Type

BOOLEAN

BOOLEAN

UInt8

SMALLINT

UInt16

INTEGER

UInt32

BIGINT

UInt64

DECIMAL(20, 0)

Int8

TINYINT

Int16

SMALLINT

Int32

INTEGER

Int64

BIGINT

Float32

REAL

Float64

DOUBLE

Decimal(P, S)

DECIMAL(P, S)

Decimal32(S)

DECIMAL(P, S)

Decimal64(S)

DECIMAL(P, S)

Decimal128(S)

DECIMAL(P, S)

IPv4

VARCHAR

IPv6

VARCHAR

UUID

VARCHAR

Enum8

VARCHAR

Enum16

VARCHAR

String

VARCHAR / VARBINARY

Fixedstring(N)

VARCHAR / VARBINARY

Date

DATE

DateTime

TIMESTAMP

Performance Optimization

  • Subquery pushdown

    The query pushdown function is supported to improve query speed.

    This function is disabled by default. You can enable it by setting related parameters according to 3.c and select a pushdown type.

  • Scalar UDF pushdown

    The Scalar UDF pushdown function is enabled by default. Before you use this function, create a mapping function in HetuEngine as needed.

Constraints

  • HetuEngine supports interconnecting with ClickHouse using the following SQL syntaxes: SHOW CATALOGS, SCHEMAS, TABLES, COLUMNS, DESCRIBE, USE, and SELECT TABLE/VIEW.
  • Tables and views that support interconnection between HetuEngine and ClickHouse:

    Item

    Supported Table and View

    Tables that support interconnection between HetuEngine and ClickHouse

    Local table (MergeTree)

    Replicated table (ReplicatedReplacingMergeTree)

    Distributed table

    Views that support interconnection between HetuEngine and ClickHouse

    Normal view

    Materialized view