Help Center> Cloud Data Migration> User Guide> Job Management> Scenario-based Migration

Scenario-based Migration

Scenario-based migration migrates snapshots and then restores table data to speed up migration.

Prerequisites

The CDM cluster can communicate with the data source.
You have obtained the URL and the account for accessing the data source. The account is granted with the read and write permissions for the data source.

Link to Hadoop

CDM supports the following Hadoop data sources:

MRS

When connecting CDM to Hadoop of MRS, configure the parameters as described in Table 1.

**Table 1** MRS Hadoop link parameters
Parameter	Description	Example Value
Name	Link name, which should be defined based on the data source type, so it is easier to remember what the link is for	mrs_scen_link
Manager IP	IP address of MRS Manager. Click Select next to the Manager IP text box to select an MRS cluster. CDM automatically fills in the authentication information.	127.0.0.1
Authentication Method	Authentication method used for accessing MRS SIMPLE: for non-security mode KERBEROS: for security mode	SIMPLE
HBase Version	Set it to the HBase version on the server.	HBASE_2_X
HIVE Version	Set it to the Hive version on the server.	HIVE_3_X
Username	If Authentication Method is set to KERBEROS, you must provide the username and password used for logging in to MRS Manager. If you need to create a snapshot when exporting a directory from HDFS, the user configured here must have the administrator permission on HDFS.	cdm
Password	Password used for logging in to MRS Manager	-
Run Mode	Run mode of the HDFS link. The options are as follows: EMBEDDED: The link instance runs with CDM. This mode delivers better performance. STANDALONE: The link instance runs in an independent process. If CDM needs to connect to multiple Hadoop data sources (MRS, Hadoop, or CloudTable) with both Kerberos and Simple authentication modes, select STANDALONE or configure different agents. Agent: The link instance runs on an agent. If STANDALONE is selected, CDM can migrate data between HDFSs of multiple MRS clusters.	STANDALONE

FusionInsight Hadoop

When connecting CDM to Hadoop of FusionInsight HD, configure the parameters as described in Table 2.

**Table 2** FusionInsight Hadoop link parameters
Parameter	Description	Example Value
Name	Link name, which should be defined based on the data source type, so it is easier to remember what the link is for	FI_hdfs_link
Manager IP	IP address of FusionInsight Manager	127.0.0.1
Manager Port	Port number of FusionInsight Manager	28443
CAS Server Port	Port number of the CAS server used to connect to FusionInsight	20009
Username	Username used for logging in to FusionInsight Manager If you need to create a snapshot when exporting a directory from HDFS, the user configured here must have the administrator permission on HDFS.	cdm
Password	Password used for logging in to FusionInsight Manager	-
Authentication Method	Authentication method used for accessing FusionInsight HD SIMPLE: for non-security mode KERBEROS: for security mode	KERBEROS
HBase Version	Set it to the HBase version on the server.	HBASE_2_X
HIVE Version	Set it to the Hive version on the server.	HIVE_3_X
Run Mode	Run mode of the HDFS link. The options are as follows: EMBEDDED: The link instance runs with CDM. This mode delivers better performance. STANDALONE: The link instance runs in an independent process. If CDM needs to connect to multiple Hadoop data sources (MRS, Hadoop, or CloudTable) with both Kerberos and Simple authentication modes, select STANDALONE or configure different agents. Note: The STANDALONE mode is used to solve the version conflict problem. If the connector versions of the source and destination ends of the same link are different, a JAR file conflict occurs. In this case, you need to place the source or destination end in the STANDALONE process to prevent the migration failure caused by the conflict. Agent: The link instance runs on an agent.	STANDALONE

Apache Hadoop

When connecting CDM to Apache Hadoop, configure parameters as described in Table 3.

**Table 3** Apache Hadoop link parameters
Parameter	Description	Example Value
Name	Link name, which should be defined based on the data source type, so it is easier to remember what the link is for	hadoop_hdfs_link
URI	NameNode URI	hdfs://nn1.example.com/
ZooKeeper Address	ZooKeeper address, which needs to be configured for HBase scenario-based migration	hbase-node-1:2181
Hive Metastore	Hive metadata address. For details, see the hive.metastore.uris configuration item.	thrift://host-192-168-1-212:9083
Authentication Method	Authentication method used for accessing Hadoop SIMPLE: Select this if Hadoop is in non-security mode. KERBEROS: Select this if Hadoop is in security mode. Obtain the Principal account and Keytab File file of the client for authentication.	KERBEROS
Principal	When Authentication Method is set to KERBEROS, the Principal account is used for authentication. You can contact the Hadoop administrator to obtain the account.	USER@YOUR-REALM.COM
Keytab File	When Authentication Method is set to KERBEROS, this file is used for authentication. You can contact the Hadoop administrator to obtain the file.	/opt/user.keytab
IP and Host Name Mapping	If the HDFS configuration file uses the host name, configure the mapping between the IP address and host name. Separate the IP addresses and host names by spaces and mappings by semicolons (;), carriage returns, or line feeds.	10.1.6.9 hostname01 10.2.7.9 hostname02
HBase Version	Set it to the HBase version on the server.	HBASE_2_X
HIVE Version	Set it to the Hive version on the server.	HIVE_3_X
Run Mode	Run mode of the HDFS link. The options are as follows: EMBEDDED: The link instance runs with CDM. This mode delivers better performance. STANDALONE: The link instance runs in an independent process. If CDM needs to connect to multiple Hadoop data sources (MRS, Hadoop, or CloudTable) with both Kerberos and Simple authentication modes, select STANDALONE or configure different agents. Note: The STANDALONE mode is used to solve the version conflict problem. If the connector versions of the source and destination ends of the same link are different, a JAR file conflict occurs. In this case, you need to place the source or destination end in the STANDALONE process to prevent the migration failure caused by the conflict. Agent: The link instance runs on an agent.	STANDALONE

Procedure

Log in to the CDM management console.
In the left navigation pane, click Cluster Management. Locate the target cluster and click Job Management.
Choose Job Management > Link Management > Create Link and set the connector type to Hadoop release version.
Click Next. Set link parameters by referring to Link to Hadoop.
Click Test to check whether the link is available. Alternatively, click Save. The system will automatically check whether the link is available.

If the network is poor or the data source is too large, the link test may take 30 to 60 seconds.
Choose Scenario Migration > Create Job. The page for configuring the job is displayed. Select a migration scenario (Hadoop migration, Hive migration, or HBase migration) and configure the job name.

Figure 1 Configuring a scenario-based migration Job
Configure the source and destination job parameters, and select the link name and name of the database to be migrated.

Figure 2 Configuring job parameters
Click Next to access the page for selecting tables. You can select the tables to be migrated to the migration destination based on your requirements.

Click Next and set job parameters.

Table 4 describes related parameters.

**Table 4** Task configuration parameters
Parameter	Description	Example Value
Write Dirty Data	Whether to record dirty data. By default, this parameter is set to No.	Yes
Write Dirty Data Link	This parameter is only displayed when Write Dirty Data is set to Yes. Only links to OBS support dirty data writes.	obs_link
OBS Bucket	This parameter is only displayed when Write Dirty Data Link is a link to OBS. Name of the OBS bucket to which the dirty data will be written.	dirtydata
Dirty Data Directory	This parameter is only displayed when Write Dirty Data is set to Yes. Directory for storing dirty data on OBS. Dirty data is saved only when this parameter is configured. You can go to this directory to query data that fails to be processed or is filtered out during job execution, and check the source data that does not meet conversion or cleaning rules.	/user/dirtydir
Max. Error Records in a Single Shard	This parameter is only displayed when Write Dirty Data is set to Yes. When the number of error records of a single map exceeds the upper limit, the job will automatically terminate and the imported data cannot be rolled back. You are advised to use a temporary table as the destination table. After the data is imported, rename the table or combine it into the final data table.	0

Click Save or Save and Run.

When the job starts running, a sub-job will be generated for each table. You can click the job name to view the sub-job list.

Parent topic: Job Management

Last Article: Entire DB Migration

Next Article: Scheduling Job Execution

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English

Help Center

Scenario-based Migration

Prerequisites

Link to Hadoop

Procedure