Updated on 2024-08-30 GMT+08:00

Development Suggestions

Configure Multiple IP Addresses for the ClickHouseBalancer Instance

Configuring multiple IP addresses can prevent single point of failure (SPOF) for ClickHouseBalancer. The configuration (with properties) is as follows:

'url' = 'jdbc:clickhouse://IP address 1 of the ClickHouseBalancer instance:ClickHouseBalancer port,IP address 2 of the ClickHouseBalancer instance:ClickHouseBalancer port/default',

Configure Proper Batch Parameters for Sink Tables

Parameters for batch write:

Flink stores data in the memory and flushes the data to the database table when the trigger condition is met.

Configurations:

  • sink.buffer-flush.max-rows: number of rows written to ClickHouse. The default value is 100
  • sink.buffer-flush.interval: interval for batch write. The default value is 1s.

If either of the two conditions is met, a sink operation is triggered. That is, data will be flushed to the database table.

  • Example 1: sink every 60 seconds
    'sink.buffer-flush.max-rows' = '0',
    'sink.buffer-flush.interval' = '60s'
  • Example 2: sink every 100 records
    'sink.buffer-flush.max-rows' = '100',
    'sink.buffer-flush.interval' = '0s'
  • Example 3: no sink
    'sink.buffer-flush.max-rows' = '0',
    'sink.buffer-flush.interval' = '0s'

Create the ReplacingMergeTree Table in the ClickHouse for Data Deduplication

When Flink writes data to ClickHouseBalancer, data with the same key cannot be written to the same ClickHouseServer. The merge of data with the same key depends on the ReplacingMergeTree engine of ClickHouse.