Updated on 2022-08-11 GMT+08:00

What Is CDM?

Product Overview

Cloud Data Migration (CDM) is an efficient and easy-to-use batch data integration service. Based on the big data migration to the cloud and intelligent data lake solution, CDM provides easy-to-use migration capabilities and capabilities of integrating multiple data sources to the data lake, reducing the complexity of data source migration and integration and effectively improving the data migration and integration efficiency.

In the DataArts Studio service, CDM serves as the DataArts Migration component, which provides the same capabilities as the independent CDM service. In later sections of this document, cloud data migration and data integration both refer to CDM.

Based on the distributed computing framework and the parallel processing technology, CDM helps you migrate massive sets of data stably and efficiently. You can migrate data online and quickly construct a desired data structure.

Figure 1 CDM positioning

Functions

  • Table/file/entire DB migration

    Tables or files can be migrated in batches. An entire database can be migrated between homogeneous and heterogeneous databases. A job can migrate hundreds of tables.

  • Incremental data migration

    CDM supports incremental migration of files, relational databases, and HBase/CloudTable, as well as with WHERE clauses and macro variables of date and time.

  • Migration in transaction mode

    When a CDM job fails to be executed, CDM rolls back the data to the state before the job starts and automatically deletes data from the destination table.

  • Field conversion

    CDM supports field conversion functions, such as anonymization, character string operations, and date operations.

  • File encryption

    When files are migrated to a file system, CDM can encrypt the files written to the cloud.

  • MD5 verification

    MD5 verification is supported to check the file consistency from end to end and output verification result.

  • Dirty data archiving

    CDM can archive the data that fails to be processed during migration, has been filtered out, or is not compliant with conversion or cleaning rules to dirty data logs. The threshold for dirty data ratio can be set to determine whether a task is successful.