Kafka Incremental Extraction
Overview
Kafka incremental extraction refers to extracting data from Kafka within a specified time range to achieve periodic data synchronization. This policy is suitable for periodically synchronizing data in Kafka to other storage systems (such as a Hive data lake). By properly configuring the start time, end time, and scheduling period for extracting Kafka data, you can create a task that synchronizes incremental data periodically and efficiently.
Scenarios
Common scenarios include but are not limited to the following:
- Hourly synchronization: New data in Kafka is synchronized to a Hive data lake every hour.
- Daily synchronization: New data in Kafka is synchronized to a Hive data lake every day.
- More granular periodic synchronization: You can set a more granular interval (for example, every 15 minutes) for synchronizing data.
Procedure
- Configure job parameter variables..
- Start time: Enter startTime:#{DateUtil.format(DateUtil.addHours(Job.planTime,-1),"yyyy-MM-dd HH:mm:ss")}, which indicates one hour before the task scheduling time
- End time: Enter endTime:endTime:#{DateUtil.format(Job.planTime,"yyyy-MM-dd HH:mm:ss")}, which indicates the task scheduling time.
Figure 1 Configuring job parameter variables
- Configure a Kafka read task..
Set Consumption Record Policy to Time Range, Start Time to ${startTime}, and End Time to ${endTime}.
Figure 2 Configuring a Kafka read task
- Configure a scheduling policy..
Set Scheduling Frequency to Hours. The task is scheduled by hour and extracts Kafka messages generated within one hour.
Figure 3 Configuring the scheduling period
Summary
By properly configuring the start time, end time, and scheduling period for extracting Kafka data, you can create a task that synchronizes incremental data periodically and efficiently. This method is suitable for synchronizing data in Kafka periodically to other storage systems and significantly improves the data processing efficiency and reliability. You are advised to adjust and optimize configurations based on your requirements and environment to achieve optimal synchronization performance.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot