Product Architecture and Function Principles

The following figure shows the product architecture and function principles of DRS.

Figure 1 DRS product architecture

Architecture Description

Minimum permission design
1. Java Database Connectivity (JDBC) is used to connect to the source and destination databases, so you do not have to deploy programs on the databases.
2. A task runs on an independent and exclusively used VM. Data is isolated between tenants.
3. The number of IP addresses is limited. Only the DRS instance IP address is allowed to access the source and destination databases.
Reliability design
1. Automatic reconnection: If the connection between DRS and your database breaks down due to bad network or database switchover, DRS automatically retries the connection until the task is restored.
2. Resumable upload: When the connection to the source or the destination is abnormal, DRS automatically marks the current replay point. After the fault is rectified, you can resume data transfer from the replay point to ensure data consistency.
3. If the VM where the DRS replication instance is located fails, services are automatically switched to a new VM with the IP address unchanged to ensure that the migration task is not interrupted.

The character set standard used by DRS is Unicode 6.2.0.

Basic Principles of Real-Time Migration

Figure 2 Real-time migration principle

Take the full+incremental migration as an example. A complete migration process includes four phases.
1. Phase 1: Structure migration. DRS queries the databases, tables, and primary keys to be migrated from the source and creates corresponding objects in the destination.
2. Phase 2: Full data migration. DRS uses the parallel technology to query all data from the source and inserts the data into the destination, which is fast and convenient. Before the full migration is started, incremental data is extracted and saved in advance to ensure data integrity and consistency in the subsequent incremental migration process.
3. Phase 3: Incremental data migration. After the full migration task is complete, the incremental migration task is started. The incremental data generated after the start of the full migration is continuously parsed, converted, and replayed to the destination database until data is in sync between the source and destination databases.
4. Phase 4: To prevent data from being operated by triggers and events during the migration, triggers and events will be migrated after a migration task is complete.
Principles of the underlying module for full migration:
Sharding module: calculates the sharding logic of each table using the optimized sharding algorithm.

Extraction module: queries data from the source database in parallel mode based on the calculated shard information.

Replay module: inserts the data queried by the extraction module into the destination database in parallel and multi-task mode.
Principles of the underlying module for incremental migration:
Log reading module: reads the original incremental log data (for example, binlog for MySQL) from the source database, parses the data, converts the data into the standard log format, and stores it locally.

Log replay module: processes and filters incremental logs based on the standard format converted by the log reading module, and synchronizes the incremental data to the destination database.

Basic Principles of Backup Migration

Figure 3 Backup migration principle

DRS allows you to migrate data from a Microsoft SQL Server database to the cloud using the backup file of the database. You can copy the full and incremental backup files of the source database to an OBS bucket. DRS downloads that files from the bucket and uploads them to the disk of the destination database. After the pre-check and verification are complete, DRS runs the import command to restore the data to the destination database.

Basic Principles of Real-Time Synchronization

Figure 4 Real-time synchronization principle

Real-time synchronization can ensure that data is always in sync between the source and destination databases. It mainly applies to synchronization from OLTP to OLAP or from OLTP to big data components in real time. The technical principles of full|+incremental synchronization and real-time migration are basically the same. However, there is a slight difference between them in different scenarios.

DRS supports heterogeneous synchronization (between different DB engines). It means that DRS converts the structure definition statements of the source database to match that of the destination database. In addition, DRS can map and convert database field types. You can refer to Mapping Data Types of heterogeneous databases or use Database and Application Migration UGO (UGO) to synchronize the structure of heterogeneous databases.
DRS allows you to configure data processing rules, so you can use these rules to extract, parse, and replay data to meet your service requirements.
Objects such as accounts, triggers, and events cannot be synchronized.
Real-time synchronization is often used in many-to-one scenario. DDL operations in many-to-one and one-to-many scenarios are specially processed.

Basic Principles of Data Subscription

Figure 5 Data subscription principle

Data subscription provides an SDK so that customers' service programs can obtain incremental data from the source database in real time.

DRS extracts original incremental logs from the source database, parses the logs into the standard format, and persists the logs to the local host. In addition, DRS invokes the notification interface of the client subscription SDK in real time to push incremental data to the client service program. Then, the client can consume data changes based on service requirements.

The incremental data consumed by the client program is recorded on the server in real time. The DRS server can continues to push incremental data from the last consumption position in scenarios such as service interruption and reconnection.

Basic Principles of Real-Time Disaster Recovery

DRS uses the real-time replication technology to implement disaster recovery for two databases. The underlying technical principles are the same as those of real-time migration. The difference is that real-time DR supports forward synchronization and backward synchronization. In addition, disaster recovery is performed on the instance-level, which means that databases and tables cannot be selected.

Basic Principles of Workload Replay

A workload replay task consists of SQL recording and replay. All of the SQL statements (create, delete, update, and query operations) executed in the required period on the source database will be downloaded by a recording tool from the binlog, and then cached and injected into the destination database where you can trigger a replay and review performance. By specifying the replay thread and speed, you can simulate the peak service load of the source database and analyze the stability of the destination database when workloads increase sharply.

Previous topic: Real-Time Disaster Recovery

Next topic: Mapping Data Types