Updated on 2025-05-07 GMT+08:00

Data Verification

  • Data verification criteria

    After the migration is complete, it is necessary to compare the data consistency between the source and destination ends. The precision of data consistency comparisons varies by scenario. Typically, the database tables for core businesses must be 100% consistent between the source and destination ends. For certain scenarios in big data businesses, such as user profile computing, 90% consistency of raw data is acceptable. Below is reference criteria that can be adjusted based on site requirements.

    Table 1 Reference for data verification criteria

    Category

    Data Consistency Requirement

    Example Business

    Core business

    100%

    The core member data, transaction data, and payment data of an e-commerce system are the most critical assets for users, involving real financial amounts. Therefore, these core services require 100% data consistency. You are advised to perform data comparison on rows, objects, and sample values.

    Non-core business

    99.9%

    The shopping cart data and customer service communication message data of an e-commerce system are non-core business data. A minor loss will not impact the customer's service usage or experience. If the switchover time is limited, it is suggested to only compare the number of data rows.

    Peripheral business

    90%

    The home page recommendation data, user browsing data, and user profile data of an e-commerce system, if partially lost, will not affect the customer's service usage or experience. It is suggested to perform row-level comparisons and sample value checks.

  • Data verification methods

    Data is categorized into database data, middleware data, and file data, each with distinct consistency verification methods and tools.

    • The methods for verifying database data consistency are described in the following table.
      Table 2 Database data consistency comparison methods

      Item

      Tool

      Description

      Database- and table-level content comparison

      DRS tool

      Query and compare each data record in the database table to ensure that every field in each record matches the corresponding fields in the source database table. Content comparison is generally slower than row comparison.

      Python script

      Based on the DRS task ID, call an API to batch execute comparison tasks and export the results to an XLSX file. Compared to the DRS tool, Python scripts can be executed in batches, improving execution efficiency.

      Database- and table-level object comparison

      DRS tool

      Compares objects like databases, indexes, tables, views, stored procedures, functions, and table sorting rules.

      Python script

      Based on the DRS task ID, call an API to batch execute comparison tasks and export the results to an XLSX file. Compared to the DRS tool, Python scripts can be executed in batches, improving execution efficiency.

      Database and table-level row count comparison

      DRS tool

      Compare the number of rows in tables to check if they are consistent. Only the row count is queried, making the comparison process faster.

      Python script

      The batch script creates N concurrent task threads, iterates through all tables individually, and outputs the comparison results to an XLSX file. Compared to the DRS tool, Python scripts can be executed in batches, improving execution efficiency.

    • The methods for verifying middleware data consistency are described in the following table.
      Table 3 Middleware data consistency comparison methods

      Item

      Tool

      Description

      Keys quantity comparison

      redis-cli

      Run the redis-cli command info keyspace to view the values of the keys and expires parameters, subtract expires from keys in the source Redis, check the difference, and repeat this for the target Redis. If both differences are the same, it indicates that the number of keys is consistent and the migration is successful.

      Key-value content comparison

      Open-source Redis-Full-Check

      Data verification is conducted by fully comparing the data content in the source and destination Redis. The tool captures data from both ends multiple times for differentiated comparisons, recording inconsistent data for subsequent rounds. Through continuous rounds of comparison, the differences gradually converge. The final discrepancy results are stored in SQLite; if no discrepancies exist, it indicates that the data content is complete and the migration was successful.

    • The methods for verifying file data consistency are described in the following table.
      Table 4 File data consistency comparison methods

      Type

      Item

      Tool

      Description

      Object storage

      Object quantity

      OMS

      The OMS migration tool verifies file integrity through MD5 and checks if the data volume of objects in both buckets is consistent.

      File storage

      File quantity

      Rclone

      Rclone utilizes MD5 hash values to verify file integrity. Post-synchronization, it compares the quantity of files at the source and destination ends.

      Rsync

      Rsync utilizes MD5 hash values to validate file integrity; if the checksums do not match, Rsync will retransmit the file to guarantee data consistency. Post-synchronization, it compares the quantity of files at the source and destination ends.

      File size

      Python script

      After the migration is completed, it determines if there is consistency by comparing the total file sizes at the source and destination ends.

      File content

      Python script

      After the migration is completed, it compares the hash values of the source and destination files by calculating them to verify if they match.