Link to HDFS

CDM supports the following HDFS data sources:

MRS HDFS

When connecting CDM to HDFS of MRS, configure the parameters as described in Table 1.

Table 1 MRS HDFS link parameters

Parameter

Description

Example Value

Name

Link name, which should be defined based on the data source type, so it is easier to remember what the link is for

mrs_hdfs_link

Manager IP

Floating IP address of MRS Manager. Click Select next to the Manager IP text box to select an MRS cluster. CDM automatically fills in the authentication information.

127.0.0.1

Username

If Authentication Method is set to KERBEROS, you must provide the username and password used for logging in to MRS Manager.

If you need to create a snapshot when exporting a directory from HDFS, the user configured here must have the administrator permission on HDFS.

cdm

Password

Password used for logging in to MRS Manager

-

Authentication Method

Authentication method used for accessing MRS
  • SIMPLE: Select this if MRS is in non-security mode.
  • KERBEROS: Select this if MRS is in security mode.

SIMPLE

Run Mode

Run mode of the HDFS link. The options are as follows:
  • EMBEDDED: The link instance runs with CDM. This mode delivers better performance.
  • STANDALONE: The link instance runs in an independent process. If CDM needs to connect to multiple Hadoop data sources (MRS, Hadoop, or CloudTable) with both Kerberos and Simple authentication modes, select STANDALONE or configure different agents.

    Note: The STANDALONE mode is used to solve the version conflict problem. If the connector versions of the source and destination ends of the same link are different, a JAR file conflict occurs. In this case, you need to place the source or destination end in the STANDALONE process to prevent the migration failure caused by the conflict.

  • Agent: The link instance runs on an agent.

If STANDALONE is selected, CDM can migrate data between HDFSs of multiple MRS clusters.

STANDALONE

Agent

Click Select and select the agent created in Connecting to an Agent. This parameter is displayed when Run Mode is set to Agent.

-

FusionInsight HDFS

When connecting CDM to HDFS of FusionInsight HD, configure the parameters as described in Table 2.

Table 2 FusionInsight HDFS link parameters

Parameter

Description

Example Value

Name

Link name, which should be defined based on the data source type, so it is easier to remember what the link is for

FI_hdfs_link

Manager IP

IP address of FusionInsight Manager

127.0.0.1

Manager Port

Port number of FusionInsight Manager

28443

CAS Server Port

Port number of the CAS server used to connect to FusionInsight

20009

Username

Username used for logging in to FusionInsight Manager.

If you need to create a snapshot when exporting a directory from HDFS, the user configured here must have the administrator permission on HDFS.

cdm

Password

Password used for logging in to FusionInsight Manager

-

Authentication Method

Authentication method used for accessing FusionInsight HD
  • SIMPLE: Select this if FusionInsight HD is in non-security mode.
  • KERBEROS: Select this if FusionInsight HD is in security mode.

KERBEROS

Run Mode

Run mode of the HDFS link. The options are as follows:
  • EMBEDDED: The link instance runs with CDM. This mode delivers better performance.
  • STANDALONE: The link instance runs in an independent process. If CDM needs to connect to multiple Hadoop data sources (MRS, Hadoop, or CloudTable) with both Kerberos and Simple authentication modes, select STANDALONE or configure different agents.

    Note: The STANDALONE mode is used to solve the version conflict problem. If the connector versions of the source and destination ends of the same link are different, a JAR file conflict occurs. In this case, you need to place the source or destination end in the STANDALONE process to prevent the migration failure caused by the conflict.

  • Agent: The link instance runs on an agent.

STANDALONE

Agent

Click Select and select the agent created in Connecting to an Agent. This parameter is displayed when Run Mode is set to Agent.

-

Apache HDFS

When connecting CDM to HDFS of Apache Hadoop, configure the parameters as described in Table 3.

Table 3 Apache HDFS link parameters

Parameter

Description

Example Value

Name

Link name, which should be defined based on the data source type, so it is easier to remember what the link is for

hadoop_hdfs_link

URI

NameNode URI

hdfs://nn1.example.com/

Authentication Method

Authentication method used for accessing Hadoop
  • SIMPLE: Select this if Hadoop is in non-security mode.
  • KERBEROS: Select this if Hadoop is in security mode. Obtain the Principal account and Keytab File file of the client for authentication.

KERBEROS

Principal

When Authentication Method is set to KERBEROS, the Principal account is used for authentication. You can contact the Hadoop administrator to obtain the account.

USER@YOUR-REALM.COM

Keytab File

When Authentication Method is set to KERBEROS, this file is used for authentication. You can contact the Hadoop administrator to obtain the file.

/opt/user.keytab

Run Mode

Run mode of the HDFS link. The options are as follows:
  • EMBEDDED: The link instance runs with CDM. This mode delivers better performance.
  • STANDALONE: The link instance runs in an independent process. If CDM needs to connect to multiple Hadoop data sources (MRS, Hadoop, or CloudTable) with both Kerberos and Simple authentication modes, select STANDALONE or configure different agents.

    Note: The STANDALONE mode is used to solve the version conflict problem. If the connector versions of the source and destination ends of the same link are different, a JAR file conflict occurs. In this case, you need to place the source or destination end in the STANDALONE process to prevent the migration failure caused by the conflict.

  • Agent: The link instance runs on an agent.

STANDALONE

IP and Host Name Mapping

This parameter is used only when Run Mode is set to EMBEDDED or STANDALONE.

If the HDFS configuration file uses the host name, configure the mapping between the IP address and host name. Separate the IP addresses and host names by spaces and mappings by semicolons (;), carriage returns, or line feeds.

10.1.6.9 hostname01

10.2.7.9 hostname02

Agent

If Run Mode is set to Agent, click Select and select the agent created in Connecting to an Agent.

-