Development Suggestions
Traffic Limiting Must Be Set When Kafka Is the Source
This rule is available for MRS 3.3.0 or later.
To prevent job exceptions caused by heavy traffic, set a traffic limit, which should be the peak value of the pressure test for service rollout.
[Example]
# The following parameter takes effect at any parallelism: 'scan.records-per-second.limit' = '1000' # The actual traffic limit is as follows: min( parallelism * scan.records-per-second.limit, partitions num * scan.records-per-second.limit)
Write Data with the Same Key to the Same Kafka Partition for Data Accuracy
Flink uses the fixed policy to write data to Kafka and performs hash calculation based on the key before writing data.
[Example]
CREATE TABLE kafka ( f_sequence INT, f_sequence1 INT, f_sequence2 INT, f_sequence3 INT ) WITH ( 'connector' = 'kafka', 'topic' = 'yxtest123', 'properties.bootstrap.servers' = '192.168.0.104:9092', 'properties.group.id' = 'testGroup1', 'scan.startup.mode' = 'latest-offset', 'format' = 'json', 'sink.partitioner'='fixed' ); insert into kafka select /*+ DISTRIBUTEBY('f_sequence','f_sequence1') */ * from datagen;
Set Kafka Source Parallelism Same as the Number of Topic Partitions for Faster Kafka Consumption
When the parallelism of Kafka Source is greater than the number of topic partitions, no more data is consumed.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot