Incremental File Migration
CDM supports incremental migration of file systems. After full migration is complete, all new files or only specified directories or files can be exported.
Currently, CDM supports the following incremental migration modes:
- Exporting all new files
- Application scenarios: Both the migration source and destination are file systems (OBS/OSS/HDFS/FTP/SFTP/NAS).
- Key configurations: Skipping Duplicate Files and Schedule Execution
- Prerequisites: None
- Exporting the files in a specified directory
- Application scenarios: The migration source is a file system (OBS/OSS/HDFS/FTP/SFTP/NAS). The migration destination can be of any type. In incremental migration, only the specified files are written to the migration destination. The existing records are not updated or deleted.
- Key configurations: File/Path Filter and Schedule Execution
- Prerequisites: The source directory or file name contains the time field.
- Exporting the files modified after the specified time point
- Application scenarios: The migration source is a file system (OBS/OSS/HDFS/FTP/SFTP/NAS). The migration destination can be of any type. The specified time point refers to the time when the file is modified. CDM migrates the files modified after the specified time point.
- Key configurations: Time Filter and Schedule Execution
- Prerequisites: None
Skipping Duplicate Files
- Parameter position: When creating a table/file migration job, if the migration source and destination are file systems, set Duplicate File Processing Method in Destination Job Configuration to Replace, Skip, or Stop job.
- Parameter principle: If a file with the same name and size exists on the migration source and destination, CDM determines that the file is a duplicate file.
- Example configurations:
- Source Directory/File: If you set this parameter to a directory, CDM imports all files in the directory to the migration destination.
- File Format: Select Binary. CDM directly copies the files without resolving the content, which is applicable to the migration of files to files.
- Duplicate File Processing Method: Select Skip.
Figure 1 Skipping duplicate files
- Configure scheduled job execution.
In this way, you can import the newly added files to the destination directory periodically to implement incremental synchronization.
File/Path Filter
- Parameter position: When creating a table/file migration job, if the migration source is a file system, set Filter Type in advanced attributes of Source Job Configuration to Wildcard or Regular expression.
- Parameter principle: If you select Yes for Wildcard, CDM filters files or paths based on the configured wildcard character and migrates only files or paths that meet the specified condition.
- Example configurations:
Suppose that the source file name contains the date and time field, such as 2017-10-15 20:25:26, the /opt/data/file_20171015202526.data file is generated. Set the parameters as follows:
- Filter Type: Select Wildcard.
- File Filter: Enter "*${dateformat(yyyyMMdd,-1,DAY)}*", which is the format of the macro variables of date and time supported by CDM. For details, see Incremental Synchronization Using the Macro Variables of Date and Time.
Figure 2 Filtering files
- Schedule Execution: Set Cycle (days) to 1.
In this way, you can import the files generated in the previous day to the destination directory every day to implement incremental synchronization.
In incremental file migration, Path Filter is used in the same way as File Filter. The path name must contain the time field. In this case, all files in the specified path can be synchronized periodically.
Time Filter
- Parameter position: When creating a table/file migration job, if the migration source is a file system, set select Yes for Time Filter.
- Parameter principle: If you specify Modification Time, only the files whose modification time is later than the specified time are migrated to CDM.
- Example configurations:
Suppose that you want CDM to synchronize only the files generated after November 2, 2018 to the migration destination, configure the following parameters:
- Time Filter: select Yes.
- Modification Time: Enter a value in the format of yyyy-MM-dd HH:mm:ss, such as 2018-01-01 00:00:00.
Figure 3 Time Filter
- Duplicate File Processing Method: Select Skip.
- Configure scheduled job execution.
In this way, the CDM job migrates only files generated after January 1, 2018, and performs incremental synchronization next time it is started.
Last Article: Advanced Operations
Next Article: Incremental Migration of Relational Databases
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.