Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Overview

Updated on 2025-02-17 GMT+08:00

MgC allows you to verify the consistency of data migrated from various big data computing and storage engines, such as Hive, HBase, Doris, and MaxCompute. Consistency verification ensures data accuracy and reliability and enables to you migrate big data to Huawei Cloud with confidence.

Precautions

  • A pair of verification tasks for the source and the target must use the same verification method.
  • If the source and target HBase clusters use different security authentication modes, the verification tasks cannot be executed at the same time, or they will fail to be executed. This is because the authentication information must be handled differently in each cluster. The secured cluster requires authentication information to be loaded, whereas the non-secured cluster needs that information cleared.
  • If the source Lindorm or HBase service is locked due to arrears, you can still create data connections and verification tasks, but data access and operations will be restricted, preventing verification tasks from being executed. Before starting data verification, ensure that the source big data service is active and your account balance is sufficient. If the service is locked, promptly pay the fee to unlock it. Once the service is unlocked, you can execute the data verification tasks again.
  • The verification results of data migrated between Hive 2.x and Hive 3.x may be inaccurate. In Hive 2.x, when you query the fixed-length type CHAR (N) of data, if the actual data length does not meet the specified length N, Hive will pad the string with spaces to reach the required length. However, in Hive 3.x, this padding operation does not occur during queries. This may result in differences between different versions. To avoid this issue, you are advised to use Beeline to perform the verification.
  • When you YARN to run data verification at the source and target MRS clusters, execute the verification tasks separately. Ensure that one task is completed before starting another.
  • When you verify data consistency for clusters of MRS 3.3.0 or later, do not use cluster nodes as executors, or the verification will fail.

Notes and Constraints

  • Before verifying data migrated from EMR Delta Lake to MRS Delta Lake, please note:
    • If the source EMR cluster uses Spark 3.3.1, data verification is supported regardless of whether the source cluster contains metadata storage.
    • If the source EMR cluster uses Spark 2.4.8, data verification is supported only when the source cluster contains metadata storage.
  • Verification is not available for HBase tables that only store cold data.
  • A verification task must be completed within one day. If the task extends past midnight (00:00), the verification results may be inaccurate. Plan verification tasks carefully to avoid execution across days.
  • Field verification is not supported if the source Alibaba Cloud cluster uses ClickHouse 21.8.15.7 and the target Huawei Cloud cluster uses ClickHouse 23.3.2.37. This is because the two versions process IPv4 and IPv6 data types and function calculation results differently.
  • During the daily incremental verification, hourly incremental verification, and date-based verification for Hive, partitions with a Date-type partition field that does not follow the standard YYYY-MM-DD format cannot be verified.
  • MgC cannot verify the consistency of data migrated between secured HBase 2.x clusters. The accuracy of verification is impacted by version compatibility restrictions, differences in security authentication mechanisms, protocol and interface inconsistencies, as well as variations in feature support and configuration between different versions.

Verification Methods

  • Full verification: The consistency of all inventory data is verified.
  • Daily incremental verification: The consistency of incremental data is verified based on the creation or update time. You can choose to verify incremental data for one day or several consecutive days.
  • Hourly incremental verification: Data consistency is verified based on the creation time or update time multiple times within 24 hours. The verification automatically stops at 00:00 on the next day.
  • Date-based verification: This method applies only to tables partitioned by date in the year, month, and day format. You can choose to verify consistency of such tables for one day or several consecutive days. Tables that are not partitioned by date are not verified.
  • Selective verification: This method can be used to verify the consistency of the data within a specified time period. You can only select a period going backward from the current time for verification.

Supported Source and Target Components

Source Component

Target Component

  • Hive
  • HBase
  • Doris
  • MaxCompute
  • ClickHouse
  • Delta Lake
  • Hudi
  • Hive
  • DLI
  • MRS (Doris)
  • MRS (HBase)
  • MRS (ClickHouse)
  • CloudTable (ClickHouse)
  • CloudTable (HBase)
  • Delta
  • Hudi

Verification Methods Available for Each Component

Component

Verification Method

Hive

  • Full verification
  • Daily incremental verification
  • Hourly incremental verification
  • Date-based verification

DLI

MaxCompute

  • Full verification
  • Daily incremental verification
  • Hourly incremental verification
  • Date-based verification

Doris

  • Full verification
  • Daily incremental verification
  • Hourly incremental verification

HBase

  • Full verification
  • Selective verification

ClickHouse

Full verification

ApsaraDB for ClickHouse

Full verification

CloudTable (HBase)

  • Full verification
  • Selective verification

CloudTable (ClickHouse)

Full verification

Delta

  • Full verification
  • Daily incremental verification
  • Hourly incremental verification
  • Date-based verification

Hudi

  • Full verification
  • Daily incremental verification
  • Hourly incremental verification
  • Date-based verification

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback