Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

From HDFS

Updated on 2024-01-16 GMT+08:00

Sample JSON File

"from-config-values": {
        "configs": [
          {
            "inputs": [
              {
                "name": "fromJobConfig.inputDirectory",
                "value": "/hdfsfrom/from_hdfs_est.csv"
              },
              {
                "name": "fromJobConfig.inputFormat",
                "value": "CSV_FILE"
              },
              {
                "name": "fromJobConfig.columnList",
                "value": "1"
              },
              {
                "name": "fromJobConfig.fieldSeparator",
                "value": ","
              },
              {
                "name": "fromJobConfig.quoteChar",
                "value": "false"
              },
              {
                "name": "fromJobConfig.regexSeparator",
                "value": "false"
              },
              {
                "name": "fromJobConfig.firstRowAsHeader",
                "value": "false"
              },
              {
                "name": "fromJobConfig.encodeType",
                "value": "UTF-8"
              },
              {
                "name": "fromJobConfig.fromCompression",
                "value": "NONE"
              },
              {
                "name": "fromJobConfig.compressedFileSuffix",
                "value": "*"
              },
              {
                "name": "fromJobConfig.splitType",
                "value": "FILE"
              },
              {
                "name": "fromJobConfig.useMarkerFile",
                "value": "false"
              },
              {
                "name": "fromJobConfig.fileSeparator",
                "value": "|"
              },
              {
                "name": "fromJobConfig.filterType",
                "value": "NONE"
              }
            ],
            "name": "fromJobConfig"
          }
        ]
      }

Parameter Description

  • HDFS job parameter description

    Parameter

    Mandatory

    Type

    Description

    fromJobConfig.inputDirectory

    Yes

    String

    Path for storing data to be extracted. For example, /data_dir.

    fromJobConfig.inputFormat

    Yes

    Enumeration

    File format required for data transmission. Currently, the following file formats are supported:
    • CSV_FILE: CSV format
    • PARQUET_FILE: Parquet format
    • BINARY_FILE: binary format

    If you select BINARY_FILE, the migration destination must also be a file system.

    fromJobConfig.columnList

    No

    String

    Numbers of columns to be extracted. Use & to separate column numbers in ascending order. For example, 1&3&5.

    fromJobConfig.lineSeparator

    No

    String

    Lind feed character in a file. By default, the system automatically identifies \\n, \\r, and \\r\\n. You can configure special characters. For spaces and carriage returns, encode them with URL. You can also configure them by editing the job JSON, in which case URL encoding is not required.

    fromJobConfig.fieldSeparator

    No

    String

    Field delimiter. This parameter is valid only when the file format is CSV_FILE. The default value is ,.

    fromJobConfig.quoteChar

    No

    Boolean

    Whether to use the encircling symbol. If this parameter is set to true, the field delimiters in the encircling symbol are regarded as a part of the string value. Currently, the default encircling symbol of CDM is double quotation mark (").

    fromJobConfig.regexSeparator

    No

    Boolean

    Whether to use the regular expression to separate fields. This parameter is valid only when the file format is CSV_FILE.

    fromJobConfig.encodeType

    No

    String

    Encoding type. For example, UTF_8 or GBK.

    fromJobConfig.firstRowAsHeader

    No

    Boolean

    Whether to regard the first line as the heading line. This parameter is valid only when the file format is CSV_FILE. When you migrate a CSV file to a table, CDM writes all data to the table by default. If this parameter is set to true, CDM uses the first line of the CSV file as the heading line and does not write the line to the destination table.

    fromJobConfig.fromCompression

    No

    Enumeration

    Compression format. Only the source files in specified compression format are transferred. NONE indicates files in all formats are transferred.

    fromJobConfig.compressedFileSuffix

    No

    String

    Extension of the files to be decompressed. The decompression operation is performed only when the file name extension is used in a batch of files. Otherwise, files are transferred in the original format. If you enter * or leave the parameter blank, all files are decompressed.

    fromJobConfig.splitType

    No

    Enumeration

    Whether to split files by file or size. If HDFS files are split, each shard is regarded as a file.
    • FILE: Split files by file quantity. If there are 10 files and throttlingConfig.numExtractors is set to 5, each shard consists of two files.
    • SIZE: Split files by file size. Files will not be split for balance. Suppose there are 10 files, among which nine are 10 MB and one is 200 MB in size. If throttlingConfig.numExtractors is set to 2, two shards will be created, one for processing the nine 10 MB files, the other for processing the 200 MB file.

    fromJobConfig.useMarkerFile

    No

    Boolean

    Whether to start a job by a marker file. A job is started only when a marker file for starting the job exists in the source path. Otherwise, the job will be suspended for a period of time specified by fromJobConfig.waitTime.

    fromJobConfig.markerFile

    No

    String

    Name of the marker file for starting a job. After a marker file is specified, the task is executed only when the file exists in the source path. If the marker file is not specified, this function is disabled by default. For example, ok.txt.

    fromJobConfig.fileSeparator

    No

    String

    File separator. If you enter multiple file paths in fromJobConfig.inputDirectory, CDM uses the file separator to separate files. The default value is |.

    fromJobConfig.filterType

    No

    Enumeration

    Filter type. Possible values are as follows:
    • WILDCARD: Enter a wildcard character to filter paths or files. CDM will migrate the paths or files that meet the filter condition.
    • TIME: Specify a time filter. CDM will migrate the files modified after the specified time point.

    fromJobConfig.pathFilter

    No

    String

    Path filter, which is configured when the filter type is WILDCARD. It is used to filter the file directories. For example, *input.

    fromJobConfig.fileFilter

    No

    String

    File filter, which is configured when the filter type is WILDCARD. It is used to filter files in the specified directory. Use commas (,) to separate multiple files. For example, *.csv,*.txt.

    fromJobConfig.startTime

    No

    String

    If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified at or after the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss.

    This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss,-90,DAY))} indicates that only files generated within the latest 90 days are migrated.

    fromJobConfig.endTime

    No

    String

    If you set Filter Type to Time Filter, and specify a point in time for this parameter, only the files modified before the specified time are transferred. The time format must be yyyy-MM-dd HH:mm:ss.

    This parameter can be set to a macro variable of date and time. For example, ${timestamp(dateformat(yyyy-MM-dd HH:mm:ss))} indicates that only the files whose modification time is earlier than the current time are migrated.

    fromJobConfig.createSnapshot

    No

    Boolean

    If this parameter is set to true, CDM creates a snapshot for the source directory to be migrated (the snapshot cannot be created for a single file) before it reads files from HDFS. Then CDM migrates the data in the snapshot.

    Only the HDFS administrator can create a snapshot. After the CDM job is completed, the snapshot is deleted.

    fromJobConfig.formats

    No

    Data structure

    Time format. This parameter is mandatory only when fromJobConfig.inputFormat is set to CSV_FILE and the time field exists in the file. For details, see Description of the fromJobConfig.formats parameter.

    fromJobConfig.decryption

    No

    Enumeration

    This parameter is available only when fromJobConfig.inputFormat is set to BINARY_FILE. It specifies whether to decrypt the encrypted file before export, and the decryption method. The options are as follows:
    • NONE: Do not decrypt but directly export the file.
    • AES-256-GCM: Use the AES-256-GCM (NoPadding) algorithm to decrypt the file and then export the file.

    fromJobConfig.dek

    No

    String

    Data decryption key. The key is a string of 64-bit hexadecimal numbers and must be the same as the data encryption key toJobConfig.dek configured during encryption. If the encryption and decryption keys are inconsistent, the system does not report an exception, but the decrypted data is incorrect.

    fromJobConfig.iv

    No

    String

    Initialization vector required for decryption. The initialization vector is a string of 32-bit hexadecimal numbers and must be the same as the initialization vector toJobConfig.iv configured during encryption. If the encryption and decryption keys are inconsistent, the system does not report an exception, but the decrypted data is incorrect.

  • Description of the fromJobConfig.formats parameter

    Parameter

    Mandatory

    Type

    Description

    name

    Yes

    String

    Column number. For example, 1.

    value

    Yes

    String

    Time format. For example, yyyy-MM-dd.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback