Scenario-based Migration
Scenario-based migration migrates snapshots and then restores table data to speed up migration.
Prerequisites
- The CDM cluster can communicate with the data source.
- You have obtained the URL and the account for accessing the data source. The account is granted with the read and write permissions for the data source.
MRS
When connecting CDM to Hadoop of MRS, configure the parameters as described in Table 1.
|
Parameter |
Description |
Example Value |
|---|---|---|
|
Name |
Link name, which should be defined based on the data source type, so it is easier to remember what the link is for |
mrs_scen_link |
|
Manager IP |
IP address of MRS Manager. Click Select next to the Manager IP text box to select an MRS cluster. CDM automatically fills in the authentication information. |
127.0.0.1 |
|
Authentication Method |
Authentication method used for accessing MRS
|
SIMPLE |
|
HBase Version |
Set it to the HBase version on the server. |
HBASE_2_X |
|
HIVE Version |
Set it to the Hive version on the server. |
HIVE_3_X |
|
Username |
If Authentication Method is set to KERBEROS, you must provide the username and password used for logging in to MRS Manager. If you need to create a snapshot when exporting a directory from HDFS, the user configured here must have the administrator permission on HDFS. |
cdm |
|
Password |
Password used for logging in to MRS Manager |
- |
|
Run Mode |
Run mode of the HDFS link. The options are as follows:
If STANDALONE is selected, CDM can migrate data between HDFSs of multiple MRS clusters. |
STANDALONE |
FusionInsight Hadoop
When connecting CDM to Hadoop of FusionInsight HD, configure the parameters as described in Table 2.
|
Parameter |
Description |
Example Value |
|---|---|---|
|
Name |
Link name, which should be defined based on the data source type, so it is easier to remember what the link is for |
FI_hdfs_link |
|
Manager IP |
IP address of FusionInsight Manager |
127.0.0.1 |
|
Manager Port |
Port number of FusionInsight Manager |
28443 |
|
CAS Server Port |
Port number of the CAS server used to connect to FusionInsight |
20009 |
|
Username |
Username used for logging in to FusionInsight Manager If you need to create a snapshot when exporting a directory from HDFS, the user configured here must have the administrator permission on HDFS. |
cdm |
|
Password |
Password used for logging in to FusionInsight Manager |
- |
|
Authentication Method |
Authentication method used for accessing FusionInsight HD
|
KERBEROS |
|
HBase Version |
Set it to the HBase version on the server. |
HBASE_2_X |
|
HIVE Version |
Set it to the Hive version on the server. |
HIVE_3_X |
|
Run Mode |
Run mode of the HDFS link. The options are as follows:
|
STANDALONE |
Apache Hadoop
When connecting CDM to Apache Hadoop, configure parameters as described in Table 3.
|
Parameter |
Description |
Example Value |
|---|---|---|
|
Name |
Link name, which should be defined based on the data source type, so it is easier to remember what the link is for |
hadoop_hdfs_link |
|
URI |
NameNode URI |
hdfs://nn1.example.com/ |
|
ZooKeeper Address |
ZooKeeper address, which needs to be configured for HBase scenario-based migration |
hbase-node-1:2181 |
|
Hive Metastore |
Hive metadata address. For details, see the hive.metastore.uris configuration item. |
thrift://host-192-168-1-212:9083 |
|
Authentication Method |
Authentication method used for accessing Hadoop
|
KERBEROS |
|
Principal |
When Authentication Method is set to KERBEROS, the Principal account is used for authentication. You can contact the Hadoop administrator to obtain the account. |
USER@YOUR-REALM.COM |
|
Keytab File |
When Authentication Method is set to KERBEROS, this file is used for authentication. You can contact the Hadoop administrator to obtain the file. |
/opt/user.keytab |
|
IP and Host Name Mapping |
If the HDFS configuration file uses the host name, configure the mapping between the IP address and host name. Separate the IP addresses and host names by spaces and mappings by semicolons (;), carriage returns, or line feeds. |
10.1.6.9 hostname01 10.2.7.9 hostname02 |
|
HBase Version |
Set it to the HBase version on the server. |
HBASE_2_X |
|
HIVE Version |
Set it to the Hive version on the server. |
HIVE_3_X |
|
Run Mode |
Run mode of the HDFS link. The options are as follows:
|
STANDALONE |
Procedure
- Log in to the CDM management console.
- In the left navigation pane, click Cluster Management. Locate the target cluster and click Job Management.
- Choose and set the connector type to Hadoop release version.
- Click Next. Set link parameters by referring to Link to Hadoop.
- Click Test to check whether the link is available. Alternatively, click Save. The system will automatically check whether the link is available.
If the network is poor or the data source is too large, the link test may take 30 to 60 seconds.
- Choose . The page for configuring the job is displayed. Select a migration scenario (Hadoop migration, Hive migration, or HBase migration) and configure the job name.
Figure 1 Configuring a scenario-based migration Job
- Configure the source and destination job parameters, and select the link name and name of the database to be migrated.
Figure 2 Configuring job parameters
- Click Next to access the page for selecting tables. You can select the tables to be migrated to the migration destination based on your requirements.
- Click Next and set job parameters.
Table 4 describes related parameters.
Table 4 Task configuration parameters Parameter
Description
Example Value
Write Dirty Data
Whether to record dirty data. By default, this parameter is set to No.
Yes
Write Dirty Data Link
This parameter is only displayed when Write Dirty Data is set to Yes.
Only links to OBS support dirty data writes.
obs_link
OBS Bucket
This parameter is only displayed when Write Dirty Data Link is a link to OBS.
Name of the OBS bucket to which the dirty data will be written.
dirtydata
Dirty Data Directory
This parameter is only displayed when Write Dirty Data is set to Yes.
Directory for storing dirty data on OBS. Dirty data is saved only when this parameter is configured.
You can go to this directory to query data that fails to be processed or is filtered out during job execution, and check the source data that does not meet conversion or cleaning rules.
/user/dirtydir
Max. Error Records in a Single Shard
This parameter is only displayed when Write Dirty Data is set to Yes.
When the number of error records of a single map exceeds the upper limit, the job will automatically terminate and the imported data cannot be rolled back. You are advised to use a temporary table as the destination table. After the data is imported, rename the table or combine it into the final data table.
0
- Click Save or Save and Run.
When the job starts running, a sub-job will be generated for each table. You can click the job name to view the sub-job list.
Last Article: Entire DB Migration
Next Article: Scheduling Job Execution
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.