Updated on 2025-07-21 GMT+08:00

Apache HDFS Connection Parameters (Internal Test)

Table 1 Apache HDFS connection

Parameter

Mandatory

Description

Data Connection Type

Yes

The value is fixed at Apache HDFS.

Name

Yes

Name of the data connection to create. Data connection names can contain a maximum of 100 characters. They can contain only letters, digits, underscores (_), and hyphens (-).

Description

No

A description which can help identify the data connection more easily. It can contain a maximum of 100 characters.

Tag

No

Attribute of the data connection to create. Tags make management easier.
NOTE:

The tag name can contain only letters, digits, and underscores (_) and cannot start with an underscore (_) or contain more than 100 characters.

Applicable Modules

Yes

Select the modules for which this connection is available. The connection can be used in the selected modules.

NOTE:
  • When offline or real-time data migration jobs are enabled, you can select the DataArts Migration module. Then you can select this data connection when creating a data migration job in DataArts Factory.
  • You can use offline or real-time data migration jobs only after you apply for the whitelist membership. To use this feature, contact customer service or technical support.

Basic and Network Connectivity Configuration

Use Cluster Config

Yes

Select a cluster configuration that has been created.

You can use the cluster configuration to simplify parameter settings for the Hadoop connection. The function is disabled by default.

URI

Yes

This parameter is displayed when Use Cluster Config is enabled.

NameNode URI. You can enter hdfs://IP address of the NameNode instance:8020.

IP and Host Name Mapping

No

This parameter is displayed when Use Cluster Config is enabled.

This parameter is used only when Run Mode is set to EMBEDDED or STANDALONE.

If the HDFS configuration file uses the host name, configure the mapping between the IP address and host name. Separate the IP addresses and host names by spaces and mappings by semicolons (;), carriage returns, or line feeds.

KMS Key

Yes

KMS key used to encrypt and decrypt data source authentication information. Select a default or custom key.
NOTE:
  • When you use KMS for encryption through DataArts Studio or KPS for the first time, the default key dlf/default or kps/default is automatically generated. For more information about default keys, see What Is a Default Master Key?.
  • Only symmetric keys are supported. Asymmetric keys are not supported.

Agent

Yes

DataArts Studio cannot directly connect to non-fully managed services. An agent is required for DataArts Studio to communicate with non-fully managed services. A CDM cluster can function as an agent. If no CDM cluster is available, create one by referring to Creating a CDM Cluster.

Data Migration Configuration

Config Path

Yes

This parameter is displayed when Use Cluster Config is enabled. It specifies the OBS path for storing the cluster configuration file.

keytab File Path

Yes

This parameter is displayed when Authentication Method is set to KERBEROS.

Select the OBS path where the keytab file is stored.

Principal Name

Yes

This parameter is displayed when Authentication Method is set to KERBEROS.

Enter the Kerberos authentication username. For a Kerberos cluster, you need to upload the corresponding keytab file.

Properties

No

This parameter is available when CDM Enable is enabled. (Optional) Click Add to add the JDBC connector attributes of multiple specified data sources. For details, see the JDBC connector document of the corresponding database.

The following are some examples:
  • connectTimeout=360000 and socketTimeout=360000: When a large amount of data needs to be migrated or the entire table is retrieved using query statements, the migration fails due to connection timeout. In this case, you can customize the connection timeout interval (ms) and socket timeout interval (ms) to prevent failures caused by timeout.
  • useCursorFetch=false: By default, useCursorFetch is set to true, indicating that the JDBC connector communicates with relational databases using a binary protocol. Some third-party systems may have compatibility issues, causing migration time conversion errors. In this case, you can disable this function. Open-source MySQL databases support the useCursorFetch parameter, and you do not need to set this parameter.

Data Source Authentication and Other Function Configuration

Authentication Method

Yes

Authentication method used for accessing the cluster:
  • SIMPLE: Select this for non-security mode.
  • KERBEROS: Select this for security mode.

Process Type

Yes

Run mode of the HDFS link. The options are as follows:
  • EMBEDDED: The link instance runs with CDM. This mode delivers better performance.
  • Standalone: The link instance runs in an independent process. If CDM needs to connect to multiple Hadoop data sources (MRS, Hadoop, or CloudTable) with both Kerberos and Simple authentication modes, select STANDALONE or configure different agents.
    NOTE:

    The STANDALONE mode is used to solve the version conflict problem. If the connector versions of the source and destination ends of the same link are different, a JAR file conflict occurs. In this case, you need to place the source or destination end in the STANDALONE process to prevent the migration failure caused by the conflict.