Updated on 2024-10-25 GMT+08:00

Adding a ClickHouse Data Source

In a ClickHouse data source, a schema or database cannot contain tables with the same name but different case formats, for example, cktable (lowercase), CKTABLE (uppercase), and CKtable (uppercase and lowercase). Otherwise, HetuEngine cannot use the tables in the schema or database.

Procedure for Adding a ClickHouse Data Source

  1. Log in to FusionInsight Manager as a HetuEngine administrator and choose Cluster > Services > HetuEngine. The HetuEngine service page is displayed.
  2. In the Basic Information area on the Dashboard tab page, click the link next to HSConsole WebUI. The HSConsole page is displayed.
  3. Choose Data Source and click Add Data Source. On the Add Data Source page that is displayed, configure parameters.

    1. In the Basic Configuration area, configure Name and choose JDBC > ClickHouse for Data Source Type.
    2. Configure parameters in the ClickHouse Configuration area. For details, see Table 1.
      Table 1 ClickHouse Configuration

      Parameter

      Description

      Example Value

      Driver

      The default value is clickhouse.

      clickhouse

      JDBC URL

      JDBC URL of the ClickHouse data source.

      • If the ClickHouse data source uses IPv4, the format is jdbc:clickhouse://<host>:<port>.
      • If the ClickHouse data source uses IPv6, the format is jdbc:clickhouse://[<host>]:<port>.
      • To obtain the value of <host>, log in to Manager of the cluster where the ClickHouse data source is located, choose Cluster > Services > ClickHouse > Instance, and view the ClickHouseBalancer service IP address. Select an IP address randomly. Currently, only one IP address can be entered.
      • To obtain the value of <port>, log in to Manager of the cluster where the ClickHouse data source is located, click Cluster, choose Services > ClickHouse, click Configurations, and click All Configurations. If the ClickHouse data source is in security mode, check the HTTPS port number of the ClickHouseBalancer instance, that is, the value of lb_https_port. If the ClickHouse data source is in normal mode, check the HTTP port number of the ClickHouseBalancer instance, that is, the value of lb_http_port.

      jdbc:clickhouse://10.162.156.243:21426 or jdbc:clickhouse://10.162.156.243:21425

      Username

      Username used for connecting to the ClickHouse data source.

      Change the value based on the username being connected with the data source.

      Password

      User password used for connecting to the ClickHouse data source.

      Change the value based on the user password for connecting to the data source.

      Case-sensitive Table/Schema Name

      Whether to support case-sensitive schema/table names of the data source.

      HetuEngine supports case-sensitive schema/table names of the data source.

      • No: If multiple table names exist in the same schema of a data source, for example, cktable (lowercase), CKTABLE (uppercase), and CKtable (lowercase and uppercase), only cktable (lowercase) can be used by HetuEngine.
      • Yes: Only one table name can exist in the same schema of the data source, for example, cktable (lowercase), CKTABLE (uppercase), or CKtable (lowercase and uppercase). Otherwise, all tables in the schema cannot be used by HetuEngine.

      -

    3. (Optional) Customize the configuration.
      You can click Add to add custom configuration parameters. Configure custom parameters of the ClickHouse data source. For details, see Table 2.
      Table 2 Custom parameters of the ClickHouse data source

      Parameter

      Description

      Example Value

      use-connection-pool

      Whether to use the JDBC connection pool.

      true

      jdbc.connection.pool.maxTotal

      Maximum number of connections in the JDBC connection pool.

      8

      jdbc.connection.pool.maxIdle

      Maximum number of idle connections in the JDBC connection pool.

      8

      jdbc.connection.pool.minIdle

      Minimum number of idle connections in the JDBC connection pool.

      0

      jdbc.connection.pool.testOnBorrow

      Whether to check the connection validity when using a connection obtained from the JDBC connection pool.

      false

      clickhouse.map-string-as-varchar

      Whether to convert the ClickHouse data source of the String and FixedString types to the Varchar type.

      Default value: true

      true

      clickhouse.socket-timeout

      Timeout interval for connecting to the ClickHouse data source.

      Unit: millisecond

      Default value: 120000

      120000

      case-insensitive-name-matching.cache-ttl

      Timeout interval for caching case-sensitive names of schemas or tables of the data sources.

      Unit: minute

      Default value: 1

      1

      You can click Delete to delete custom configuration parameters.

    1. Click OK.

  4. Log in to the node where the cluster client is located and run the following commands to switch to the client installation directory and authenticate the user:

    cd /opt/client

    source bigdata_env

    kinit User performing HetuEngine operations (If the cluster is in normal mode, skip this step.)

  5. Run the following command to log in to the catalog of the data source:

    hetu-cli --catalog Data source name --schema Database name

    For example, run the following command:

    hetu-cli --catalog clickhouse_1 --schema default

  6. Run the following command. If the database table information can be viewed or no error is reported, the connection is successful.

    show tables;

ClickHouse Data Type Mapping

Mapping from ClickHouse data types to HetuEngine data types

ClickHouse Data Type

HetuEngine Data Type

BOOLEAN

BOOLEAN

UInt8

SMALLINT

UInt16

INTEGER

UInt32

BIGINT

UInt64

DECIMAL(20, 0)

Int8

TINYINT

Int16

SMALLINT

Int32

INTEGER

Int64

BIGINT

Float32

REAL

Float64

DOUBLE

Decimal(P, S)

DECIMAL(P, S)

Decimal32(S)

DECIMAL(P, S)

Decimal64(S)

DECIMAL(P, S)

Decimal128(S)

DECIMAL(P, S)

IPv4

VARCHAR

IPv6

VARCHAR

UUID

VARCHAR

Enum8

VARCHAR

Enum16

VARCHAR

String

VARCHAR / VARBINARY

Fixedstring(N)

VARCHAR / VARBINARY

Date

DATE

DateTime

TIMESTAMP

Performance Optimization

  • Subquery pushdown

    The query pushdown function is supported to improve query speed.

  • Scalar UDF pushdown

    The Scalar UDF pushdown function is enabled by default. Before you use this function, create a mapping function in HetuEngine as needed.

Constraints on ClickHouse Data Source

  • HetuEngine supports interconnecting with ClickHouse using the following SQL syntaxes: SHOW CATALOGS, SCHEMAS, TABLES, COLUMNS, DESCRIBE, USE, and SELECT TABLE/VIEW.
  • Tables and views that support interconnection between HetuEngine and ClickHouse:

    Item

    Supported Table and View

    Tables that support interconnection between HetuEngine and ClickHouse

    Local table (MergeTree)

    Replicated table (ReplicatedReplacingMergeTree)

    Distributed table

    Views that support interconnection between HetuEngine and ClickHouse

    Normal view

    Materialized view