Updated on 2026-05-20 GMT+08:00

Configuring Incremental Read of OBS Data

Overview

OBS incremental extraction refers to extracting data from OBS within a specified time range to achieve periodic data synchronization. This policy is suitable for periodically synchronizing data in OBS to other storage systems (such as a Hive data lake). By configuring an appropriate OBS directory for reading data, you can enable efficient periodic synchronization of incremental data.

Scenarios

If the OBS bucket is divided into directories by hour or day, you can use scheduling variables and periodic scheduling to enable incremental data extraction.

Procedure

This section describes how to enable daily extraction of incremental data from an OBS bucket.

  1. Configure variables for the job.

    That is, configure the date variable dt:#{DateUtil.format(Job.planTime,"yyyy-MM-dd")}, which indicates the day when the job is scheduled.

  2. Configure the task for reading OBS data.

    Replace the source directory with a variable.

  3. Configure a scheduling policy.

    Set Scheduling Frequency to Every day. The job will be scheduled every day to extract OBS messages generated within one day.

Summary

By properly configuring the start time, end time, and scheduling period for extracting OBS data, you can create a task that synchronizes incremental data periodically and efficiently. This method is suitable for synchronizing data in OBS periodically to other storage systems and significantly improves the data processing efficiency and reliability. You are advised to adjust and optimize configurations based on your requirements and environment to achieve optimal synchronization performance.