Link to Hive

CDM supports the following Hive data sources:

MRS Hive

The MRS Hive link is used for MapReduce Service (MRS) on HUAWEI CLOUD. Table 1 describes related parameters.

To connect to an MRS 2.x cluster, create a CDM cluster of version 2.x first. CDM 1.8.x clusters cannot connect to MRS 2.x clusters.

Currently, the Hive link obtains the core-site.xml configuration information from MRS HDFS. Therefore, if MRS Hive uses OBS as the underlying storage system, configure the AK/SK of OBS on MRS HDFS before creating the Hive link.

Table 1 MRS Hive link parameters

Parameter

Description

Example Value

Name

Link name, which should be defined based on the data source type, so it is easier to remember what the link is for

hivelink

Manager IP

Floating IP address of MRS Manager. Click Select next to the Manager IP text box to select an MRS cluster. CDM automatically fills in the authentication information.

127.0.0.1

Authentication Method

Authentication method used for accessing MRS
  • SIMPLE: Select this if MRS is in non-security mode.
  • KERBEROS: Select this if MRS is in security mode.

SIMPLE

HIVE Version

Set this to the Hive version on the server.

HIVE_3_X

Username

If Authentication Method is set to KERBEROS, you must provide the username and password used for logging in to MRS Manager.

cdm

Password

Password used for logging in to MRS Manager

-

OBS storage support

The server must support OBS storage. When creating a Hive table, you can store the table in OBS.

No

Run Mode

This parameter is used only when the Hive version is HIVE_3_X. Possible values are:

  • EMBEDDED: The link instance runs with CDM. This mode delivers better performance.
  • STANDALONE: The link instance runs in an independent process. If CDM needs to connect to multiple Hadoop data sources (MRS, Hadoop, or CloudTable) with both Kerberos and Simple authentication modes, select STANDALONE or configure different agents.

    Note: The STANDALONE mode is used to solve the version conflict problem. If the connector versions of the source and destination ends of the same link are different, a JAR file conflict occurs. In this case, you need to place the source or destination end in the STANDALONE process to prevent the migration failure caused by the conflict.

  • Agent: The link instance runs on an agent.

EMBEDDED

Click Show Advanced Attributes, and then click Add to add configuration attributes of other Hive clients. The name and value of each attribute must be configured. You can click Delete to delete no longer used attributes.

FusionInsight Hive

The FusionInsight Hive link is applicable to data migration of FusionInsight HD in the local data center. You must use Direct Connect to connect to FusionInsight HD.

Table 2 describes related parameters.

Table 2 FusionInsight Hive link parameters

Parameter

Description

Example Value

Name

Link name, which should be defined based on the data source type, so it is easier to remember what the link is for

hivelink

Manager IP

Floating IP address of MRS Manager. Click Select next to the Manager IP text box to select an MRS cluster. CDM automatically fills in the authentication information.

127.0.0.1

Manager Port

FusionInsight/MRS Manager port

28443

CAS Server Port

CAS protocol port of FusionInsight/MRS Manager

20009

Authentication Method

Authentication method used for accessing MRS
  • SIMPLE: Select this if MRS is in non-security mode.
  • KERBEROS: Select this if MRS is in security mode.

SIMPLE

HIVE Version

Hive version

HIVE_3_X

Username

If Authentication Method is set to KERBEROS, you must provide the username and password used for logging in to MRS Manager.

cdm

Password

Password used for logging in to MRS Manager

-

OBS storage support

The server must support OBS storage. When creating a Hive table, you can store the table in OBS.

No

Run Mode

This parameter is used only when the Hive version is HIVE_3_X. Possible values are:

  • EMBEDDED: The link instance runs with CDM. This mode delivers better performance.
  • STANDALONE: The link instance runs in an independent process. If CDM needs to connect to multiple Hadoop data sources (MRS, Hadoop, or CloudTable) with both Kerberos and Simple authentication modes, select STANDALONE or configure different agents.

    Note: The STANDALONE mode is used to solve the version conflict problem. If the connector versions of the source and destination ends of the same link are different, a JAR file conflict occurs. In this case, you need to place the source or destination end in the STANDALONE process to prevent the migration failure caused by the conflict.

  • Agent: The link instance runs on an agent.

EMBEDDED

Click Show Advanced Attributes, and then click Add to add configuration attributes of other Hive clients. The name and value of each attribute must be configured. You can click Delete to delete no longer used attributes.

Apache Hive

The Apache Hive link is applicable to data migration of the third-party Hadoop in the local data center or ECS. You must use Direct Connect to connect to Hadoop in the local data center.

Table 3 describes related parameters.

Table 3 Apache Hive link parameters

Parameter

Description

Example Value

Name

Link name, which should be defined based on the data source type, so it is easier to remember what the link is for

hivelink

URI

NameNode URI

hdfs://hacluster

Hive Metastore

Hive metadata address. For details, see the hive.metastore.uris configuration item. Example: thrift://host-192-168-1-212:9083

-

Authentication Method

Authentication method used for accessing MRS
  • SIMPLE: Select this if MRS is in non-security mode.
  • KERBEROS: Select this if MRS is in security mode.

SIMPLE

HIVE Version

Hive version

HIVE_3_X

IP and Host Name Mapping

If the Hadoop configuration file uses the host name, configure the mapping between the IP address and host name. Separate the IP addresses and host names by spaces and mappings by semicolons (;), carriage returns, or line feeds.

-

OBS storage support

The server must support OBS storage. When creating a Hive table, you can store the table in OBS.

No

Run Mode

This parameter is used only when the Hive version is HIVE_3_X. Possible values are:

  • EMBEDDED: The link instance runs with CDM. This mode delivers better performance.
  • STANDALONE: The link instance runs in an independent process. If CDM needs to connect to multiple Hadoop data sources (MRS, Hadoop, or CloudTable) with both Kerberos and Simple authentication modes, select STANDALONE or configure different agents.

    Note: The STANDALONE mode is used to solve the version conflict problem. If the connector versions of the source and destination ends of the same link are different, a JAR file conflict occurs. In this case, you need to place the source or destination end in the STANDALONE process to prevent the migration failure caused by the conflict.

  • Agent: The link instance runs on an agent.

EMBEDDED

Click Show Advanced Attributes, and then click Add to add configuration attributes of other Hive clients. The name and value of each attribute must be configured. You can click Delete to delete no longer used attributes.