Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page

Migrating Incremental Data

Updated on 2025-02-17 GMT+08:00

Synchronize data that has been added, modified, or deleted after the previous migration is complete from source databases to Huawei Cloud DLI.

Precautions

MgC allows you to narrow the migration scope to specific partitions in MaxCompute. You must enter the names of involved tables in the Excel template in lowercase letters. MaxCompute converts table names to lowercase during table creation. If the table names in the Excel template are in uppercase, the migration will fail as the tables cannot be recognized.

Prerequisites

Procedure

  1. Sign in to the MgC console. In the navigation pane, under Project, select your big data migration project from the drop-down list.
  2. In the navigation pane on the left, choose Migrate > Big Data Migration.
  3. In the upper right corner of the page, click Create Migration Task.

  4. Select MaxCompute for Source Component, Data Lake Insight (DLI) for Target Component, Incremental data migration for Task Type, and click Next.

  5. Configure parameters required for creating an incremental data migration task based on Table 1.

    Table 1 Parameters required for creating an incremental data migration task

    Area

    Parameter

    Configuration

    Basic Settings

    Task Name

    The default name is Incremental-data-migration-from-MaxCompute-to-DLI-4 random characters (including letters and numbers). You can also customize a name.

    MgC Agent

    Select the MgC Agent you connected to MgC in Preparations.

    Source Settings

    Source Connection

    Select the source connection you created.

    MaxCompute Parameters (Optional)

    The parameters are optional and usually left blank. If needed, you can configure the parameters by referring to MaxCompute Documentation.

    Data Scope

    Migration Filter

    • Time: Select which incremental data to migrate based on when it was last changed. If you select this option, you need to set parameters such as Time Range, Filter Partitions, and Define Scope.
    • Custom: Migrate incremental data in specified partitions. If you select this option, perform the following steps to specify the partitions to be migrated:
      1. Under Include Partitions, click Download Template to download the template in CSV format.
      2. Open the downloaded CSV template file with Notepad.
        CAUTION:

        Do not use Excel to edit the CSV template file. The template file edited and saved in Excel cannot be identified by MgC.

      3. Retain the first line in the CSV template file. From the second line onwards, enter the information about tables to be migrated in the format of {MaxCompute project name},{Table name},{Partition field},{Partition key}. MaxCompute project name refers to the name of the MaxCompute project where the data to be migrated is managed. Table name refers to the data table to be migrated.
        NOTICE:

        Use commas (,) to separate elements in each line. Do not use spaces or other separators.

        After adding the information about a table, press Enter to start a new line.

      4. After all table information is added, save the changes to the CSV file.
      5. Under Include Partitions, click Add File and upload the edited and saved CSV file to MgC.

    Time Range

    Select a T-N option to limit the migration to the incremental data generated within a specific time period (24 × N hours) before the task start time (T). Assume that you select T-1 and the task was executed at 14:50 on June 6, 2024. The system migrates incremental data generated from 14:50 on June 5, 2024 to 14:50 on June 6, 2024.

    If you select Specified Date, only incremental data generated on the specified date is migrated.

    Filter Partitions

    Decide whether to filter partitions to be migrated by update time or by creation time. The default value is By update time.

    • Select By update time to migrate the data that was changed in the specified period.
    • Select By creation time to migrate the data that was created in the specified period.

    By database

    Enter the names of databases where incremental data needs to be migrated in the Include Databases text box. Click Add to add more entries. A maximum of 10 databases can be added.

    To exclude certain tables from the migration, download the template in CSV format, add information about these tables to the template, and upload the template to MgC. For details, see steps 2 to 5.

    By table

    1. Download the template in CSV format.
    2. Open the downloaded CSV template file with Notepad.
      CAUTION:

      Do not use Excel to edit the CSV template file. The template file edited and saved in Excel cannot be identified by MgC.

    3. Retain the first line in the CSV template file. From the second line onwards, enter the information about tables to be migrated in the format of {MaxCompute project name},{Table name}. MaxCompute project name refers to the name of the MaxCompute project to be migrated. Table name refers to the data table to be migrated.
      NOTICE:
      • Use commas (,) to separate the MaxCompute project name and the table name in each line. Do not use spaces or other separators.
      • After adding the information about a table, press Enter to start a new line.
    4. After all table information is added, save the changes to the CSV file.
    5. Upload the edited and saved CSV file to MgC.

    Target Settings

    Target Connection

    Select the DLI connection with a general queue created in Creating a Target Connection.

    CAUTION:

    Do not a DLI connection with a SQL queue configured.

    Custom Parameters (Optional)

    Set the parameters as required. For details about the supported custom parameters, see Configuration parameters and Custom parameters.

    • If the migration is performed over the Internet, set the following four parameters:

    • If the migration is performed over a private network, set the following eight parameters:

      • spark.dli.metaAccess.enable: Enter true.
      • spark.dli.job.agency.name: Enter the name of the DLI agency you configured.
      • mgc.mc2dli.data.migration.dli.file.path: Enter the OBS path for storing the migration-dli-spark-1.0.0.jar package. For example, obs://mgc-test/data/migration-dli-spark-1.0.0.jar
      • mgc.mc2dli.data.migration.dli.spark.jars: Enter the OBS path for storing the fastjson-1.2.54.jar and datasource.jar packages. The value is transferred in array format. Package names must be enclosed using double quotation marks and be separated with commas (,) For example: ["obs://mgc-test/data/datasource.jar","obs://mgc-test/data/fastjson-1.2.54.jar"]
      • spark.sql.catalog.mc_catalog.tableWriteProvider: Enter tunnel.
      • spark.sql.catalog.mc_catalog.tableReadProvider: Enter tunnel.
      • spark.hadoop.odps.end.point: Enter the VPC endpoint of the region where the source MaxCompute service is provisioned. For details about the MaxCompute VPC endpoint in each region, see Endpoints in different regions (VPC). For example, if the source MaxCompute service is located in Hong Kong, China, enter http://service.cn-hongkong.maxcompute.aliyun-inc.com/api.
      • spark.hadoop.odps.tunnel.end.point: Enter the VPC Tunnel endpoint of the region where the source MaxCompute service is located. For details about the MaxCompute VPC Tunnel endpoint in each region, see Endpoints in different regions (VPC). For example, if the source MaxCompute service is located in Hong Kong, China, enter http://dt.cn-hongkong.maxcompute.aliyun-inc.com.

    Migration Settings

    Large Table Migration Rules

    Control how large a table will be split into multiple migration subtasks. You are advised to retain the default settings. You can also change the settings as needed.

    Small Table Migration Rules

    Control how small a table will be merged into one migration subtask along with other small tables. This can accelerate your migration. You are advised to retain the default settings. You can also change the settings as needed.

    Concurrency

    Set the number of concurrent migration subtasks. The default value is 3. The value ranges from 1 to 10.

    Max. SQL Statements per File

    SQL statements are generated for running migration commands. The number you set here limits how many SQL statements can be stored in a single file. The default value is 3. The value ranges from 1 to 50.

  6. After the configuration is complete, execute the task.

    NOTICE:
    • A migration task can be executed repeatedly. Each time a migration task is executed, a task execution is generated.
    • You can click the task name to modify the task configuration.
    • You can select Run immediately and click Save to create the task and execute it immediately. You can view the created task on the Tasks page.

    • You can also click Save to just create the task. You can view the created task on the Tasks page. To execute the task, click Execute in the Operation column.

  7. After the migration task is executed, click View Executions in the Operation column. On the Task Executions tab, you can view the details of the running task execution and all historical executions.

    Locate the running execution and click View in the Progress column. On the displayed Progress Details page, view and export the task execution results.

  8. (Optional) After the data migration is complete, verify data consistency between the source and the target databases. For details, see Verifying the Consistency of Data Migrated from MaxCompute to DLI.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback