From HDFS

Sample JSON File

"from-config-values": {
        "configs": [
          {
            "inputs": [
              {
                "name": "fromJobConfig.inputDirectory",
                "value": "/hdfsfrom/from_hdfs_est.csv"
              },
              {
                "name": "fromJobConfig.inputFormat",
                "value": "CSV_FILE"
              },
              {
                "name": "fromJobConfig.columnList",
                "value": "1"
              },
              {
                "name": "fromJobConfig.jsonType",
                "value": "JSON_OBJECT"
              },
              {
                "name": "fromJobConfig.fieldSeparator",
                "value": ","
              },
              {
                "name": "fromJobConfig.quoteChar",
                "value": "false"
              },
              {
                "name": "fromJobConfig.regexSeparator",
                "value": "false"
              },
              {
                "name": "fromJobConfig.firstRowAsHeader",
                "value": "false"
              },
              {
                "name": "fromJobConfig.encodeType",
                "value": "UTF-8"
              },
              {
                "name": "fromJobConfig.fromCompression",
                "value": "NONE"
              },
              {
                "name": "fromJobConfig.compressedFileSuffix",
                "value": "*"
              },
              {
                "name": "fromJobConfig.splitType",
                "value": "FILE"
              },
              {
                "name": "fromJobConfig.fromFileOpType",
                "value": "DO_NOTHING"
              },
              {
                "name": "fromJobConfig.useMarkerFile",
                "value": "false"
              },
              {
                "name": "fromJobConfig.fileSeparator",
                "value": "|"
              },
              {
                "name": "fromJobConfig.filterType",
                "value": "NONE"
              }
            ],
            "name": "fromJobConfig"
          }
        ]
      }

Parameter Description

HDFS job parameter description

Parameter	Mandatory	Type	Description
fromJobConfig.inputDirectory	Yes	String	Path for storing data to be extracted. For example, /data_dir.
fromJobConfig.inputFormat	Yes	Enumeration	File format required for data transmission. Currently, the following file formats are supported: CSV_FILE: CSV format PARQUET_FILE: Parquet format BINARY_FILE: binary format If you select BINARY_FILE, the migration destination must also be a file system.
fromJobConfig.columnList	No	String	Numbers of columns to be extracted. Use & to separate column numbers in ascending order. For example, 1&3&5.
fromJobConfig.lineSeparator	No	String	Line feed character. This parameter is valid only when the file format is CSV_FILE. The default value is \r\n.
fromJobConfig.fieldSeparator	No	String	Field delimiter. This parameter is valid only when the file format is CSV_FILE. The default value is ,.
fromJobConfig.encodeType	No	String	Encoding type. For example, UTF_8 or GBK.
fromJobConfig.firstRowAsHeader	No	Boolean	Whether to regard the first line as the heading line. This parameter is valid only when the file format is CSV_FILE. When you migrate a CSV file to a table, CDM writes all data to the table by default. If this parameter is set to true, CDM uses the first line of the CSV file as the heading line and does not write the line to the destination table.
fromJobConfig.fromCompression	No	Enumeration	Compression format. Only the source files in specified compression format are transferred. NONE indicates files in all formats are transferred.
fromJobConfig.splitType	No	Enumeration	Whether to split files by file or size. If HDFS files are split, each shard is regarded as a file. FILE: Split files by file quantity. If there are 10 files and throttlingConfig.numExtractors is set to 5, each shard consists of two files. SIZE: Split files by file size. Files will not be split for balance. Suppose there are 10 files, among which nine are 10 MB and one is 200 MB in size. If throttlingConfig.numExtractors is set to 2, two shards will be created, one for processing the nine 10 MB files, the other for processing the 200 MB file.
fromJobConfig.fromFileOpType	No	Enumeration	Source file processing mode. After a job is completed, operations on the source file can be performed. The source file can be renamed or deleted.
fromJobConfig.markerFile	No	String	Name of the marker file for starting a job. After a marker file is specified, the task is executed only when the file exists in the source path. If the marker file is not specified, this function is disabled by default. For example, ok.txt.
fromJobConfig.filterType	No	Enumeration	Filter type. Possible values are as follows: WILDCARD: Enter a wildcard character to filter paths or files. CDM will migrate the paths or files that meet the filter condition. TIME: Specify a time filter. CDM will migrate the files modified after the specified time point.
fromJobConfig.pathFilter	No	String	Path filter, which is configured when the filter type is WILDCARD. It is used to filter the file directories. For example, *input.
fromJobConfig.fileFilter	No	String	File filter, which is configured when the filter type is WILDCARD. It is used to filter files in the specified directory. Use commas (,) to separate multiple files. For example, .csv,.txt.
fromJobConfig.startTime	No	String	If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified after the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss. This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss,-90,DAY))} indicates that only files generated within the latest 90 days are migrated.
fromJobConfig.endTime	No	String	If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified before the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss. This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss))} indicates that only the files whose modification time is earlier than the current time are migrated.
fromJobConfig.createSnapshot	No	Boolean	If this parameter is set to true, CDM creates a snapshot for the source directory to be migrated (the snapshot cannot be created for a single file) before it reads files from HDFS. Then CDM migrates the data in the snapshot. Only the HDFS administrator can create a snapshot. After the CDM job is completed, the snapshot is deleted.
fromJobConfig.formats	No	Data structure	Time format. This parameter is mandatory only when fromJobConfig.inputFormat is set to CSV_FILE and the time field exists in the file. For details, see Description of the fromJobConfig.formats parameter.
fromJobConfig.decryption	No	Enumeration	This parameter is available only when fromJobConfig.inputFormat is set to BINARY_FILE. It specifies whether to decrypt the encrypted file before export, and the decryption method. The options are as follows: NONE: Do not decrypt but directly export the file. AES-256-GCM: Use the AES-256-GCM (NoPadding) algorithm to decrypt the file and then export the file.
fromJobConfig.dek	No	String	Data decryption key. The key is a string of 64-bit hexadecimal numbers and must be the same as the data encryption key toJobConfig.dek configured during encryption. If the encryption and decryption keys are inconsistent, the system does not report an exception, but the decrypted data is incorrect.
fromJobConfig.iv	No	String	Initialization vector required for decryption. The initialization vector is a string of 32-bit hexadecimal numbers and must be the same as the initialization vector toJobConfig.iv configured during encryption. If the initialization vectors are inconsistent, the system does not report an exception, but the decrypted data is incorrect.