Help Center/ Cloud Data Migration/ User Guide/ Creating a Job in a CDM Cluster/ Creating an Entire Database Migration Job
Updated on 2024-11-20 GMT+08:00

Creating an Entire Database Migration Job

Scenario

CDM supports entire DB migration between homogeneous and heterogeneous data sources. The migration principles are the same as those in Table/File Migration Jobs. Each type of Elasticsearch, each key prefix of Redis, or each collection of MongoDB can be executed concurrently as a subtask.

Each time an entire DB migration job is executed, its subtasks are recreated based on the configuration of the migration job. You cannot modify the subtasks and then run the migration job again.

Supported Data Sources lists the data sources supporting entire database migration.

Constraints

Field names of the source and destination parameters cannot contain ampersands (&) or number signs (%).

Prerequisites

  • The CDM cluster can communicate with the data source.

Procedure

  1. Log in to the management console and choose Service List > Cloud Data Migration. In the left navigation pane, choose Cluster Management. Locate the target cluster and click Job Management.
  2. Choose Entire DB Migration > Create Job. The page for configuring the job is displayed.

    Figure 1 Creating an entire DB migration job

  3. Configure the related parameters of the source database according to Table 1.

    Table 1 Parameter description

    Source Database

    Parameter

    Description

    Example Value

    • DWS
    • MySQL
    • PostgreSQL
    • SQL Server
    • Oracle
    • SAP HANA

    Schema/Tablespace

    Name of the schema or tablespace from which data will be extracted. This parameter is displayed when Use SQL Statement is set to No. Click the icon next to the text box to go to the page for selecting a schema or directly enter a schema or tablespace.

    If the desired schema or tablespace is not displayed, check whether the login account has the permissions required to query metadata.

    schema

    WHERE Clause

    WHERE clause used to specify the tables to be extracted. This parameter applies to all subtables in the entire DB migration. If this parameter is not set, the entire table is extracted. If the table to be migrated does not contain the fields specified by the WHERE clause, the migration will fail.

    You can set a date macro variable to extract data generated on a specific date. For details, see Incremental Migration of Relational Databases.

    age > 18 and age <= 60

    Null in Partition Column

    Whether a partition field can be null

    Yes

    Hive

    Database Name

    Name of the database to be migrated. The user configured in the source link must have the permission to read the database.

    hivedb

    HBase

    CloudTable

    Start Time

    Start time (included). The format is yyyy-MM-dd hh:mm:ss. The dateformat time macro variable function is supported.

    Examples: 2017-12-31 20:00:00, ${dateformat(yyyy-MM-dd, -1, DAY)} 02:00:00, and ${dateformat(yyyy-MM-dd HH:mm:ss, -1, DAY)}

    "2017-12-31 20:00:00"

    End Time

    End time (excluded) The format is yyyy-MM-dd hh:mm:ss. The dateformat time macro variable function is supported.

    Examples: 2018-01-01 20:00:00, ${dateformat(yyyy-MM-dd, -1, DAY)} 02:00:00, and ${dateformat(yyyy-MM-dd HH:mm:ss, -1, DAY)}

    "2018-01-01 20:00:00"

    Redis

    Key Filter Character

    Filter character used to determine the keys to be migrated

    For example, if the value of this parameter is a*, all asterisks (*) will be migrated.

    a*

    DDS

    Database Name

    Name of the database from which data is to be migrated. The user configured in the source link must have the permission to read the database.

    ddsdb

    Query Filter

    Filter used to match documents.

    Example: {HTTPStatusCode:{$gt:"400",$lt:"500"},HTTPMethod:"GET"}

    -

  4. Configure the related parameters, from Table 2, for the destination cloud service.

    Table 2 Destination job parameters

    Destination Database

    Parameter

    Description

    Example Value

    • RDS for MySQL
    • RDS for PostgreSQL
    • RDS for SQL Server

    -

    For details about the destination job parameters required for entire DB migration to an RDS database, see To MySQL/SQL Server/PostgreSQL.

    schema

    DWS

    -

    For details about the destination job parameters required for entire DB migration to DWS, see To DWS.

    -

    MRS Hive

    -

    For details about the destination job parameters required for entire DB migration to MRS HIVE, see To Hive.

    hivedb

    MRS HBase

    CloudTable

    -

    For details about the destination job parameters required for entire DB migration to MRS HBase or CloudTable, see To HBase/CloudTable.

    Yes

    Redis

    Clear Database

    Clears the database data before data import.

    Yes

    DDS

    Database Name

    Name of the database from which data is to be migrated. The user configured in the source link must have the permission to read the database.

    mongodb

    Migration Behavior

    Select Add or Replace.

    -

  5. If you are migrating an entire relational database, click Next after configuring job parameters to select source and destination tables. Ensure that the destination table names are the same as the source table names. For example, if the source table name is test, the destination table name must also be test.

    Figure 2 Field mapping

  6. Click Next and set job parameters.

    Figure 3 Task parameters
    Table 3 describes related parameters.
    Table 3 Task configuration parameters

    Parameter

    Description

    Example Value

    Concurrent Tables

    Number of tables to be concurrently executed

    3

    Concurrent Extractors

    Maximum number of threads of the job for reading data from the source

    NOTE:

    The number of concurrent threads may be less than or equal to the value of this parameter for some data sources that do not support concurrent extraction, for example, CSS and ClickHouse.

    1

    Write Dirty Data

    Whether to record dirty data. By default, this parameter is set to No.

    Yes

    Write Dirty Data Link

    This parameter is only displayed when Write Dirty Data is set to Yes.

    Only links to OBS support dirty data writes.

    obs_link

    OBS Bucket

    This parameter is only displayed when Write Dirty Data Link is a link to OBS.

    Name of the OBS bucket to which the dirty data will be written.

    dirtydata

    Dirty Data Directory

    This parameter is only displayed when Write Dirty Data is set to Yes.

    Directory for storing dirty data on OBS. Dirty data is saved only when this parameter is configured.

    You can go to this directory to query data that fails to be processed or is filtered out during job execution, and check the source data that does not meet conversion or cleaning rules.

    /user/dirtydir

    Max. Error Records in a Single Shard

    This parameter is only displayed when Write Dirty Data is set to Yes.

    When the number of error records of a single map exceeds the upper limit, the job will automatically terminate and the imported data cannot be rolled back. You are advised to use a temporary table as the destination table. After the data is imported, rename the table or combine it into the final data table.

    0

  7. Click Save or Save and Run.

    When the job starts running, a sub-job will be generated for each table. You can click the job name to view the sub-job list.

During the migration of an entire Oracle database to Hudi, if you select a view or a table that has no primary key at the source, automatic table creation is not supported.