Big Data Migration Wave Planning

When migrating big data to the cloud, determine whether to migrate the data in waves or in whole. The guidelines are as follows:

Scenarios suitable for migration in whole:
- Small scale: For a big data platform that has a small amount of data (TB-level) and a small number of computing tasks, you can deploy the platform on the cloud and then migrate all metadata, data, and tasks.
- Complex association: Big data tasks are associated with each other and are difficult to split.
Scenarios suitable for migration in waves: Large-scale big data with clear associations
Big data platforms handle massive amounts of data (PB-level to EB-level), along with numerous computing tasks. Although the data scale is large, the association between computing tasks is clear. For example, you can sort out tasks and split big data by service domain and classify associated data, tasks, and applications into one wave for migration. Migration in waves effectively lowers risks, simplifies the process, and boosts efficiency.

Big data migration is usually performed in waves by subject area. Business functions determine how you classify subject areas. You can group related data that shares similar business logic, such as sales, supply chain, and log processing, into a subject area. Each subject area has a dedicated data processing process, analysis model, and related business logic to meet specific business requirements and analysis objectives. The reference principles for planning big data migration waves are as follows:

Wave planning by subject area: Data correlation and task correlation need to be considered. Data correlation refers to the process of placing data with similar business logic, mutual dependency, or close relationship in the same wave to ensure consistency and integrity. Task correlation refers to the process of placing dependent tasks and data in the same wave. This ensures that tasks operate with accurate data while maintaining proper order and consistency. Based on the two correlations, subject areas can be divided into multiple migration waves, and related tasks and data flows are centralized in the same waves, improving migration efficiency and reducing risks.
Minimized number of waves: During big data migration, data extraction, conversion, and loading are performed. Each operation increases the complexity and risk and affects data consistency. Therefore, the number of waves should be minimized.
Independent waves: Ensure that different waves are independent and loosely coupled, and there are few dependent tasks and data flows. Independent wave division reduces the impact on other business domains during migration.
Intra-wave tight coupling: Ensure that each wave contains highly correlated subject areas and interdependent tasks and data flows, including data sharing scenarios.
Business continuity: Business interruption must be avoided during migration. Big data application systems closely related to a subject area must be deployed in the same wave to reduce business interruption risks.
Migration priority sorting: Prioritize subject areas based on the business priority, migration complexity, and data volume. Generally, start by migrating smaller or simpler subject areas before moving on to larger or more complex ones.