From HDFS
When the source link of a job is the Link to HDFS, that is, when data is exported from MRS HDFS, FusionInsight HDFS, or Apache HDFS, configure the source job parameters based on Table 1.
|
Category |
Parameter |
Description |
Example Value |
|---|---|---|---|
|
Basic parameters |
Source Directory/File |
Directory or file path from which data will be extracted. This parameter can be configured as a macro variable of date and time and a path name can contain multiple macro variables. When the macro variable of date and time works with a scheduled job, the incremental data can be synchronized periodically. For details, see Incremental Synchronization Using the Macro Variables of Date and Time. |
/user/cdm/ |
|
File Format |
File format used when transferring data. The options are as follows:
|
CSV |
|
|
Pull List File |
If this parameter is set to Yes, the system pulls the files corresponding to the URLs in the text file to be uploaded and stores them on OBS. The text file records the file paths on HDFS. |
Yes |
|
|
OBS Link of List File |
Select an existing OBS link. |
obs_link |
|
|
OBS Bucket of List File |
Name of the OBS bucket that stores the text file |
obs-cdm-hwstaff |
|
|
OBS Directory |
Custom OBS directories that store the text file. Use slashes (/) to separate different directories. |
test1 |
|
|
Advanced attributes |
Line Separator |
Lind feed character in a file. By default, the system automatically identifies \n, \r, and \r\n. This parameter is displayed only when File Format is set to CSV. |
\n |
|
Field Delimiter |
Character used to separate fields in the file. To set the Tab key as the delimiter, set this parameter to \t. This parameter is displayed only when File Format is set to CSV. |
, |
|
|
Use First Row as Header |
This parameter is displayed only when File Format is set to CSV. When you migrate a CSV file to a table, CDM writes all data to the table by default. If you set this parameter to Yes, CDM uses the first line of the CSV file as the heading line and does not write the line to the destination table. |
No |
|
|
File Split Method |
Whether to split files by file or size. If HDFS files are split, each shard is regarded as an individual file.
|
File |
|
|
Source File Processing Method |
Operation performed on source files after the job completes.
|
Rename |
|
|
Start Job by Marker File |
Whether to start a job by a marker file. A job is only started if there is a marker file for starting the job in the source path. If there is no marker file, the job will be suspended for a period of time specified by Suspension Period. |
ok.txt |
|
|
Wildcard |
If you select Yes, enter wildcard characters to filter files. All paths or files that meet the search criteria are transferred. For details, see File/Path Filter. |
Yes |
|
|
Path Filter |
If you set Filter Type to Wildcard, enter a wildcard character to filter paths. The paths that meet the filtering condition are migrated. You can configure multiple paths separated by commas (,). |
*input |
|
|
File Filter |
If you set Filter Type to Wildcard, you can enter a wildcard character to search for files in a specified path. The files that meet the search criteria are migrated. You can configure multiple files separated by commas (,). |
*.csv |
|
|
Time Filter |
If you select Yes, files are transferred based on their modification time. |
Yes |
|
|
Minimum Timestamp |
If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified after the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss. This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss,-90,DAY))} indicates that only files generated within the latest 90 days are migrated. |
2019-07-01 00:00:00 |
|
|
Maximum Timestamp |
If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified before the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss. This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss))} indicates that only the files whose modification time is earlier than the current time are migrated. |
2019-07-30 00:00:00 |
|
|
Create Snapshot |
If you set this parameter to Yes, CDM creates a snapshot for the source directory to be migrated (the snapshot cannot be created for a single file) before it reads files from HDFS. Then CDM migrates the data in the snapshot. Only the HDFS administrator can create a snapshot. After the CDM job is completed, the snapshot is deleted. |
No |
|
|
Encryption |
This parameter is displayed only when File Format is set to Binary.
If the source data is encrypted, CDM can decrypt the data before exporting it. Select whether to decrypt the source data and select a decryption algorithm. The options are as follows:
For details, see Encryption and Decryption During File Migration. |
AES-256-GCM |
|
|
DEK |
This parameter is displayed only when Encryption is set to AES-256-GCM. The key consists of 64 hexadecimal numbers and must be the same as the DEK configured during encryption. If the decryption and encryption keys are inconsistent, the system does not report an exception, but the decrypted data is incorrect. |
DD0AE00DFECD78BF051BCFDA25BD4E320DB0A7AC75A1F3FC3D3C56A457DCDC1B |
|
|
IV |
This parameter is displayed only when Encryption is set to AES-256-GCM. The initialization vector consists of 32 hexadecimal numbers and must be the same as the IV configured during encryption. If the initialization vectors are inconsistent, the system does not report an exception, but the decrypted data is incorrect. |
5C91687BA886EDCD12ACBC3FF19A3C3F |
HDFS supports the UTF-8 encoding only. Retain the default value UTF-8.
Last Article: From OBS/OSS/KODO/COS/S3
Next Article: From HBase/CloudTable
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.