Development Suggestions
Configure Multiple IP Addresses for the ClickHouseBalancer Instance
Configuring multiple IP addresses can prevent single point of failure (SPOF) for ClickHouseBalancer. The configuration (with properties) is as follows:
'url' = 'jdbc:clickhouse://IP address 1 of the ClickHouseBalancer instance:ClickHouseBalancer port,IP address 2 of the ClickHouseBalancer instance:ClickHouseBalancer port/default',
Configure Proper Batch Parameters for Sink Tables
Parameters for batch write:
Flink stores data in the memory and flushes the data to the database table when the trigger condition is met.
Configurations:
- sink.buffer-flush.max-rows: number of rows written to ClickHouse. The default value is 100
- sink.buffer-flush.interval: interval for batch write. The default value is 1s.
If either of the two conditions is met, a sink operation is triggered. That is, data will be flushed to the database table.
- Example 1: sink every 60 seconds
'sink.buffer-flush.max-rows' = '0', 'sink.buffer-flush.interval' = '60s'
- Example 2: sink every 100 records
'sink.buffer-flush.max-rows' = '100', 'sink.buffer-flush.interval' = '0s'
- Example 3: no sink
'sink.buffer-flush.max-rows' = '0', 'sink.buffer-flush.interval' = '0s'
Create the ReplacingMergeTree Table in the ClickHouse for Data Deduplication
When Flink writes data to ClickHouseBalancer, data with the same key cannot be written to the same ClickHouseServer. The merge of data with the same key depends on the ReplacingMergeTree engine of ClickHouse.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot