Updated on 2024-04-29 GMT+08:00

From FTP/SFTP

Sample JSON File

"from-config-values": {
        "configs": [
          {
            "inputs": [
              {
                "name": "fromJobConfig.inputDirectory",
                "value": "/sftpfrom/from_sftp.csv"
              },
              {
                "name": "fromJobConfig.inputFormat",
                "value": "CSV_FILE"
              },
              {
                "name": "fromJobConfig.columnList",
                "value": "1&2&3&4&5&6&7&8&9&10&11&12"
              },
              {
                "name": "fromJobConfig.fieldSeparator",
                "value": ","
              },
              {
                "name": "fromJobConfig.regexSeparator",
                "value": "false"
              },
              {
                "name": "fromJobConfig.firstRowAsHeader",
                "value": "false"
              },
              {
                "name": "fromJobConfig.encodeType",
                "value": "UTF-8"
              },
              {
                "name": "fromJobConfig.fromCompression",
                "value": "NONE"
              },
              {
                "name": "fromJobConfig.splitType",
                "value": "FILE"
              }
            ],
            "name": "fromJobConfig"
          }
        ]
      }

Parameter Description

Source link job parameters of FTP and SFTP are the same. Table 1 describes the parameters.
Table 1 Source link job parameters of file systems

Parameter

Mandatory

Type

Description

fromJobConfig.inputDirectory

Yes

String

Path for storing files to be extracted. You can enter a maximum of 50 file paths, which are separated by vertical bars (|). You can also customize the separators. For example, FROM/example.csv|FROM/b.txt.

fromJobConfig.inputFormat

Yes

Enumeration

File format required for data transmission. Currently, the following file formats are supported:
  • CSV_FILE: CSV format, used to migrate files to data tables
  • JSON_FILE: JSON format, used to migrate files to data tables
  • BINARY_FILE: Files (even not in binary format) will be directly transferred without resolution. It is applicable to file copy.

If you select BINARY_FILE, the migration destination must also be a file system.

fromJobConfig.lineSeparator

No

String

Lind feed character in a file. By default, the system automatically identifies \\n, \\r, and \\r\\n. You can configure special characters. For spaces and carriage returns, encode them with URL. You can also configure them by editing the job JSON, in which case URL encoding is not required.

fromJobConfig.columnList

No

String

Numbers of columns to be extracted. Use & to separate column numbers in ascending order. For example, 1&3&5.

fromJobConfig.fieldSeparator

No

String

Field delimiter. This parameter is valid only when the file format is CSV_FILE. The default value is ,.

fromJobConfig.quoteChar

No

Boolean

Whether to use the encircling symbol. If this parameter is set to true, the field delimiters in the encircling symbol are regarded as a part of the string value. Currently, the default encircling symbol of CDM is double quotation mark (").

fromJobConfig.regexSeparator

No

Boolean

Whether to use the regular expression to separate fields. This parameter is valid only when the file format is CSV_FILE.

fromJobConfig.regex

No

String

Regular expression. This parameter is valid only when the regular expression is used to separate fields.

fromJobConfig.firstRowAsHeader

No

Boolean

Whether to regard the first line as the heading line. This parameter is valid only when the file format is CSV_FILE. When you migrate a CSV file to a table, CDM writes all data to the table by default. If this parameter is set to true, CDM uses the first line of the CSV file as the heading line and does not write the line to the destination table.

fromJobConfig.fromCompression

No

Enumeration

Compression format. This parameter is valid only when the file format is CSV_FILE or JSON. The options are as follows:
  • NONE: Files in all formats are transferred.
  • GZIP: Files in gzip format are transferred.
  • ZIP: Files in Zip format are transferred.

fromJobConfig.splitType

No

Enumeration

Whether to split files by file or size.
  • FILE: Split files by file quantity. If there are 10 files and throttlingConfig.numExtractors is set to 5, each shard consists of two files.
  • SIZE: Split files by file size. Files will not be split for balance. Suppose there are 10 files, among which nine are 10 MB and one is 200 MB in size. If throttlingConfig.numExtractors is set to 2, two shards will be created, one for processing the nine 10 MB files, the other for processing the 200 MB file.

fromJobConfig.jsonReferenceNode

No

String

Reference node. This parameter is valid when the file format is JSON_FILE. Resolve data on the JSON node. If the data corresponding to the node is a JSON array, the system extracts data from the array in the same mode. Nested JSON nodes are separated by periods (.). For example, data.list.

fromJobConfig.encodeType

No

String

Encoding type. For example, UTF_8 or GBK.

fromJobConfig.useMarkerFile

No

Boolean

Whether to start a job by a marker file. A job is started only when a marker file for starting the job exists in the source path. Otherwise, the job will be suspended for a period of time specified by fromJobConfig.waitTime.

fromJobConfig.markerFile

No

String

Name of the marker file for starting a job. After a marker file is specified, the task is executed only when the file exists in the source path. If the marker file is not specified, this function is disabled by default. For example, ok.txt.

fromJobConfig.waitTime

No

String

Period of waiting for a marker file. If you set Start Job by Marker File to Yes but no marker file exists in the source path, the job fails upon suspension timeout.

If you set this parameter to 0 and no marker file exists in the source path, the job will fail immediately.

Unit: second

fromJobConfig.filterType

No

Enumeration

Filter type. Possible values are as follows:
  • WILDCARD: Enter a wildcard character to filter paths or files. CDM will migrate the paths or files that meet the filter condition.
  • TIME: Specify a time filter. CDM will migrate the files modified after the specified time point.

fromJobConfig.pathFilter

No

String

Path filter, which is configured when the filter type is WILDCARD. It is used to filter the file directories. For example, *input.

fromJobConfig.fileFilter

No

String

File filter, which is configured when the filter type is WILDCARD. It is used to filter files in the specified directory. Use commas (,) to separate multiple files. For example, *.csv,*.txt.

fromJobConfig.startTime

No

String

If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified at or after the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss.

This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss,-90,DAY))} indicates that only files generated within the latest 90 days are migrated.

fromJobConfig.endTime

No

String

If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified before the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss.

This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss))} indicates that only the files whose modification time is earlier than the current time are migrated.

fromJobConfig.fileSeparator

No

String

File separator. If you enter multiple file paths in fromJobConfig.inputDirectory, CDM uses the file separator to separate files. The default value is |.

fromJobConfig.md5FileSuffix

No

String

Check whether the files extracted by CDM are consistent with those in the migration source.