From OBS
If the source link of a job is the Link to OBS, configure the source job parameters based on Table 1.
Advanced attributes are optional and not displayed by default. You can click Show Advanced Attributes to display them.
Category |
Parameter |
Description |
Example Value |
---|---|---|---|
Basic parameters |
Bucket Name |
Name of the bucket from which data will be migrated |
BUCKET_2 |
Source Directory/File |
This parameter is available only when Pull List File is set to No. Directory or file path from which data will be extracted. You can enter a maximum of 50 file paths. By default, the file paths are separated by vertical bars (|). You can also customize a file separator. For details, see Migration of a List of Files. This parameter can be configured as a macro variable of date and time and a path name can contain multiple macro variables. When the macro variable of date and time works with a scheduled job, the incremental data can be synchronized periodically. For details, see Incremental Synchronization Using the Macro Variables of Date and Time.
NOTE:
If you have configured a macro variable of date and time and schedule a CDM job through DataArts Studio DataArts Factory, the system replaces the macro variable of date and time with (Planned start time of the data development job – Offset) rather than (Actual start time of the CDM job – Offset). |
FROM/example.csv |
|
File Format |
Format in which CDM parses data. The options are as follows:
|
CSV |
|
Pull List File |
This parameter is displayed only when File Format is set to Binary. If the pull list file function is enabled, the content of a file (such as a .txt file) in an OBS bucket can be read as the list of files to be migrated. The content in the file must be the absolute path of the file to be migrated (rather than a directory). For example, the content is as follows: /052101/DAY20211110.data /052101/DAY20211111.data |
Yes |
|
OBS Link of List File |
This parameter is available only when Pull List File is set to Yes. You can select the OBS link where the list file is located. |
OBS_test_link |
|
OBS Bucket of entries files |
This parameter is available only when Pull List File is set to Yes. It indicates the name of the OBS bucket where the list file is located. |
01 |
|
Path/Directory of entries files |
This parameter is available only when Pull List File is set to Yes. It indicates the absolute path or directory of the list file in the OBS bucket. You are advised to select the absolute path of the file. If you select a directory, files in subdirectories can also be migrated. However, if the number of files in the directory is too large, the cluster memory may become insufficient. |
/0521/Lists.txt |
|
JSON Type |
This parameter is displayed only when File Format is set to JSON. Type of a JSON object stored in a JSON file. The options are JSON object and JSON array. |
JSON object |
|
JSON Reference Node |
This parameter is used only when File Format is set to JSON and JSON Type is set to JSON Object. CDM parses the data under the JSON node. If the node's corresponding data is a JSON array, the system will extract data from the array in the same pattern. Use periods (.) to separate multi-layer nested JSON nodes. |
data.list |
|
Advanced attributes |
Line Separator |
Lind feed character in a file. By default, the system automatically identifies \n, \r, and \r\n. This parameter is displayed only when File Format is set to CSV. |
\n |
Field Delimiter |
Character used to separate fields in the file. To set the Tab key as the delimiter, set this parameter to \t. This parameter is displayed only when File Format is set to CSV. |
, |
|
Use Quote Character |
If you set this parameter to Yes, the field delimiters in the encircling symbol are regarded as a part of the string value. Currently, the default encircling symbol of CDM is ". |
No |
|
Use RE to Separate Fields |
Whether to use regular expressions to separate fields. If you set this parameter to Yes, Field Delimiter becomes invalid. This parameter is displayed only when File Format is set to CSV. |
Yes |
|
Regular Expression |
Regular expression used to separate fields. For details about regular expressions, see Regular Expressions for Separating Semi-structured Text. |
^(\d.*\d) (\w*) \[(.*)\] ([\w\.]*) (\w.*).* |
|
Use First Row as Header |
This parameter is displayed only when File Format is set to CSV. When you migrate a CSV file to a table, CDM writes all data to the table by default. If you set this parameter to Yes, CDM uses the first line of the CSV file as the heading line and does not write the line to the destination table. |
No |
|
Encoding Type |
Encoding type, for example, UTF-8 or GBK. You can set the encoding type for text files only. This parameter is invalid when File Format is set to Binary. |
GBK |
|
Compression Format |
This parameter is displayed only when File Format is set to CSV or JSON. The options are as follows:
|
NONE |
|
Compressed File Suffix |
This parameter is displayed when Compression Format is not NONE. This parameter specifies the extension of the files to be decompressed. The decompression operation is performed only when the file name extension is used in a batch of files. Otherwise, files are transferred in the original format. If you enter * or leave the parameter blank, all files are decompressed. |
* |
|
Source File Processing Method |
Operation performed on source files after the job completes.
|
No action |
|
Start Job by Marker File |
Whether to start a job by a marker file. A job is only started if there is a marker file for starting the job in the source path. If there is no marker file, the job will be suspended for a period of time specified by Suspension Period. |
No |
|
Marker File |
Name of the marker file for starting a job. If you specify a marker file, the migration job is executed only when the marker file exists in the source path. The marker file will not be migrated. |
ok.txt |
|
Suspension Period |
Waiting period for a marker file. If you set Start Job by Marker File to Yes but there is no marker file in the source path, the job fails when the suspension period times out. If you set this parameter to 0 and there is no marker file in the source path, the job will fail immediately. Unit: second |
10 |
|
File Separator |
File separator. If you enter multiple file paths in Source Directory/Files, CDM uses the file separator to identify files. The default value is |. |
| |
|
Filter Type |
Only paths or files that meet the filtering conditions are transferred. The options are None, Wildcard, and Regex. For details, see Incremental File Migration. |
Wildcard |
|
Directory Filter |
If you set Filter Type to Wildcard, enter a wildcard character to filter paths. The paths that meet the filtering condition are migrated. You can configure multiple paths separated by commas (,).
NOTE:
If you have configured a macro variable of date and time and schedule a CDM job through DataArts Studio DataArts Factory, the system replaces the macro variable of date and time with (Planned start time of the data development job – Offset) rather than (Actual start time of the CDM job – Offset). |
*input |
|
File Filter |
If you set Filter Type to Wildcard, you can enter a wildcard character to search for files in a specified path. The files that meet the search criteria are migrated. You can configure multiple files separated by commas (,).
NOTE:
If you have configured a macro variable of date and time and schedule a CDM job through DataArts Studio DataArts Factory, the system replaces the macro variable of date and time with (Planned start time of the data development job – Offset) rather than (Actual start time of the CDM job – Offset). |
*.csv,*.txt |
|
Time Filter |
If you select Yes, files are transferred based on their modification time. |
Yes |
|
Minimum Timestamp |
If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified after the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss. This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss,-90,DAY))} indicates that only files generated within the latest 90 days are migrated.
NOTE:
If you have configured a macro variable of date and time and schedule a CDM job through DataArts Studio DataArts Factory, the system replaces the macro variable of date and time with (Planned start time of the data development job – Offset) rather than (Actual start time of the CDM job – Offset). |
2019-06-01 00:00:00 |
|
Maximum Timestamp |
If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified before the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss. This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss))} indicates that only the files whose modification time is earlier than the current time are migrated.
NOTE:
If you have configured a macro variable of date and time and schedule a CDM job through DataArts Studio DataArts Factory, the system replaces the macro variable of date and time with (Planned start time of the data development job – Offset) rather than (Actual start time of the CDM job – Offset). |
2019-07-01 00:00:00 |
|
Encryption |
If the source data is encrypted, CDM can decrypt the data before exporting it. Select whether to decrypt the source data and select a decryption algorithm. The options are as follows:
For details, see Encryption and Decryption During File Migration. |
AES-256-GCM |
|
Disregard Non-existent Path or File |
If this is set to Yes, the job can be successfully executed even if the source path does not exist. |
No |
|
DEK |
This parameter is displayed only when Encryption is set to AES-256-GCM. The key consists of 64 hexadecimal numbers and must be the same as the DEK configured during encryption. If the decryption and encryption keys are inconsistent, the system does not report an exception, but the decrypted data is incorrect. |
DD0AE00DFECD78BF051BCFDA25BD4E320DB0A7AC75A1F3FC3D3C56A457DCDC1B |
|
IV |
This parameter is displayed only when Encryption is set to AES-256-GCM. The initialization vector consists of 32 hexadecimal numbers and must be the same as the IV configured during encryption. If the initialization vectors are inconsistent, the system does not report an exception, but the decrypted data is incorrect. |
5C91687BA886EDCD12ACBC3FF19A3C3F |
|
MD5 File Extension |
This parameter is displayed only when File Format is set to Binary. This parameter is used to check whether the files extracted by CDM are consistent with source files. For details, see MD5 Verification. |
.md5 |
- CDM supports incremental file migration (by skipping repeated files), but does not support resumable transfer.
For example, if three files are to be migrated and the second file fails to be migrated due to the network fault. When the migration task is started again, the first file is skipped. The second file, however, cannot be migrated from the point where the fault occurs, but can only be migrated again.
- During file migration, a single task supports millions of files. If there are too many files in the directory to be migrated, you are advised to split the files into different directories and create multiple tasks.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.