Updated on 2026-05-20 GMT+08:00

Configuring Incremental Read of FTP/SFTP Data

Overview

FTP/SFTP incremental extraction refers to extracting files added or modified in a specified period from an FTP/SFTP server to achieve periodic data synchronization. The time period can be set based on the modification and creation time of files, or time identifiers in file names). This policy is suitable for synchronizing data on an FTP/SFTP server to other storage systems (such as OBS, HDFS, and data lakes). The data that can be synchronized includes log files, service reports, and data exported from third-party systems. You can configure appropriate file filtering criteria and scheduling policies to efficiently synchronize incremental data. This avoids repeated full synchronization and improves data processing efficiency and resource utilization.

Scenarios

When files on an FTP/SFTP server are organized based on the following rules, scheduling variables and periodic scheduling can be used to achieve efficient incremental extraction:

1. The names of files contain time identifiers, for example, order_20251118.csv (the file name contains a date).

2. Files are stored by time directory, for example, /data/2025/11/18/ (year/month/day directory) or /logs/hourly/14/ (hour directory).

Procedure

This section describes how to configure a daily job to extract incremental data from an FTP/SFTP server. In this example, data files are stored in a directory in year/month/day format.

  1. Configure variables for the job.

    That is, configure the date variable dt:#{DateUtil.format(Job.planTime,"yyyy/MM/dd")}, which indicates the day when the job is scheduled.

    Figure 1 Configuring job parameters

  2. Configure the task for reading FTP/SFTP data.

    Replace the source directory with a variable.

    Figure 2 Configuring the source directory

  3. Configure a scheduling policy.

    Set Scheduling Frequency to Every day. The job will be scheduled every day to extract FTP/SFTP messages generated within one day.

    Figure 3 Configuring the scheduling period

Summary

By configuring an appropriate FTP/SFTP file path variable (based on the time identifier or directory structure) and a periodic scheduling policy, you can create a task that synchronizes incremental data efficiently. This method is suitable when files are organized based on time rules. It can significantly reduce repeated transmission, reduce network and storage costs, and ensure data timeliness and consistency.