Configuring Incremental Read of OBS Data
Overview
OBS incremental extraction refers to extracting data from OBS within a specified time range to achieve periodic data synchronization. This policy is suitable for periodically synchronizing data in OBS to other storage systems (such as a Hive data lake). By configuring an appropriate OBS directory for reading data, you can enable efficient periodic synchronization of incremental data.
Scenarios
If the OBS bucket is divided into directories by hour or day, you can use scheduling variables and periodic scheduling to enable incremental data extraction.
Procedure
This section describes how to enable daily extraction of incremental data from an OBS bucket.
- Configure variables for the job.
That is, configure the date variable dt:#{DateUtil.format(Job.planTime,"yyyy-MM-dd")}, which indicates the day when the job is scheduled.
- Configure the task for reading OBS data.
Replace the source directory with a variable.
- Configure a scheduling policy.
Set Scheduling Frequency to Every day. The job will be scheduled every day to extract OBS messages generated within one day.
Summary
By properly configuring the start time, end time, and scheduling period for extracting OBS data, you can create a task that synchronizes incremental data periodically and efficiently. This method is suitable for synchronizing data in OBS periodically to other storage systems and significantly improves the data processing efficiency and reliability. You are advised to adjust and optimize configurations based on your requirements and environment to achieve optimal synchronization performance.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot