Migrating Full Data
Migrate all data from source databases to Huawei Cloud DLI.
Prerequisites
- You have completed all preparations.
- A source connection has been created.
- Target connections have been created.
- You have been added to the whitelist that allows JAR programs to access DLI metadata. If you have not, contact technical support.
Procedure
- Sign in to the MgC console.
- In the navigation pane on the left, choose Migrate > Big Data Migration. In the upper left corner of the page, select the migration project created in Preparations.
- In the upper right corner of the page, click Create Migration Task.
- Select MaxCompute for Source Component, Data Lake Insight (DLI) for Target Component, Full data migration for Task Type, and click Next.
- Configure parameters required for creating a full data migration task based on Table 1.
Table 1 Parameters required for creating a full data migration task Area
Parameter
Configuration
Basic Settings
Task Name
The default name is Full-data-migration-from-MaxCompute-to-DLI-4 random characters (including letters and numbers). You can also customize a name.
Edge Device
Select the Edge device you connected to MgC in Making Preparations.
Source Settings
Source Connection
Select the source connection you created.
Estimated Project Period (Day) (Optional)
If this parameter is set, the system checks table lifecycle during the migration. If the lifecycle of a table ends before the expected end time of the project, the table will be skipped. If this parameter is not set, all tables are migrated by default.
MaxCompute Parameters
The parameters are optional and usually left blank. If needed, you can configure the parameters by referring to MaxCompute Documentation.
Migration Scope
By database
Enter the names of databases (projects) to be migrated in the Include Databases text box. If there are tables you do not want to migrate, download the template in CSV format, add information about these tables to the template, and upload the template to MgC. For details, see steps 2 to 5.
By table
- Download the template in CSV format.
- Open the downloaded CSV template file with Notepad.
CAUTION:
Do not use Excel to edit the CSV template file. The template file edited and saved in Excel cannot be identified by MgC.
- Retain the first line in the CSV template file. From the second line onwards, enter the information about tables to be migrated in each line in the format of {MaxComute project name},{Table name}. MaxComute project name refers to the name of the MaxCompute project to be migrated. Table name refers to the data table to be migrated.
NOTICE:
- Use commas (,) to separate the MaxCompute project name and the table name in each line. Do not use spaces or other separators.
- After adding the information about a table, press Enter to start a new line.
- After all table information is added, save the changes to the CSV file.
- Upload the edited and saved CSV file to MgC.
Target Settings
Target Connection
Select the DLI connection with a general queue created in Creating a Target Connection.
CAUTION:Do not a DLI connection with a SQL queue configured.
Custom Parameters
Configure the parameters as needed. For details, see Configuration parameter description and Custom Parameters.
- If the migration is performed over the Internet, set the following four parameters:
- spark.dli.metaAccess.enable: Enter true.
- spark.dli.job.agency.name: Enter the name of the DLI agency you configured.
- mgc.mc2dli.data.migration.dli.file.path: Enter the OBS path for storing the migration-dli-spark-1.0.0.jar package. For example, obs://mgc-test/bata/migration-dli-spark-1.0.0.jar
- mgc.mc2dli.data.migration.dli.spark.jars: Enter the OBS path for storing the fastjson-1.2.54.jar and datasource.jar packages. The value is transferred in array format. Package names must be enclosed using double quotation marks and be separated with commas (,) For example: ["obs://mgc-test/data/datasource.jar","obs://mgc-test/data/fastjson-1.2.54.jar"]
- If the migration is performed over a private network, set the following four parameters:
- spark.dli.metaAccess.enable: Enter true.
- spark.dli.job.agency.name: Enter the name of the DLI agency you configured.
- mgc.mc2dli.data.migration.dli.file.path: Enter the OBS path for storing the migration-dli-spark-1.0.0.jar package. For example, obs://mgc-test/bata/migration-dli-spark-1.0.0.jar
- mgc.mc2dli.data.migration.dli.spark.jars: Enter the OBS path for storing the fastjson-1.2.54.jar and datasource.jar packages. The value is transferred in array format. Package names must be enclosed using double quotation marks and be separated with commas (,) For example: ["obs://mgc-test/data/datasource.jar","obs://mgc-test/data/fastjson-1.2.54.jar"]
- spark.sql.catalog.mc_catalog.tableWriteProvider: Enter tunnel.
- spark.sql.catalog.mc_catalog.tableReadProvider: Enter tunnel.
- spark.hadoop.odps.end.point: Enter the VPC endpoint of the region where the source MaxCompute service is provisioned. For details about the MaxCompute VPC endpoint in each region, see Endpoints in different regions (VPC). For example, if the source MaxCompute service is located in Hong Kong, China, enter http://service.cn-hongkong.maxcompute.aliyun-inc.com/api.
- spark.hadoop.odps.tunnel.end.point: Enter the VPC Tunnel endpoint of the region where the source MaxCompute service is located. For details about the MaxCompute VPC Tunnel endpoint in each region, see Endpoints in different regions (VPC). For example, if the source MaxCompute service is located in Hong Kong, China, enter http://dt.cn-hongkong.maxcompute.aliyun-inc.com.
Migration Settings
Large Table Migration Rules
Control how large a table will be split into multiple migration subtasks. You are advised to retain the default settings. You can also change the settings as needed.
Small Table Migration Rules
Control how small a table will be merged into one migration subtask along with other small tables. This can accelerate your migration. You are advised to retain the default settings. You can also change the settings as needed.
Concurrency
Set the number of concurrent migration subtasks. The default value is 3. The value ranges from 1 to 10.
Max. SQL Statements Per File
SQL statements are generated for running migration commands. The number you set here limits how many SQL statements can be stored in a single file. The default value is 3. The value ranges from 1 to 50.
- After the configuration is complete, execute the task.
- A migration task can be executed repeatedly. Each time a migration task is executed, a task execution is generated.
- You can click the task name to modify the task configuration.
- You can select Run immediately and click Save to create the task and execute it immediately. You can view the created task on the Tasks page.
- You can also click Save to just create the task. You can view the created task on the Tasks page. To execute the task, click Execute in the Operation column.
- After the migration task is executed, click View Executions in the Operation column. On the displayed Task Executions tab page, you can view the details of the running task execution and all historical executions.
Click Execute Again in the Status column to run the execution again.
Click View in the Progress column. On the displayed Progress Details page, you can view the details of data tables processed and the migration subtasks created in the execution.
- (Optional) After the data migration is complete, verify data consistency between the source and the target databases. For details, see Verifying the Consistency of Data Migrated from MaxCompute to DLI.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot