Updated on 2026-05-20 GMT+08:00

MRS Hudi

DataArts Migration supports the main versions of MRS Hudi, meeting your data synchronization requirements in various deployment environments.

Preparation and Constraints

  • Network requirements

    The MRS Hudi data source can communicate with CDM. This ensures smooth data transmission. For details, see Enabling Network Connectivity.

  • Required permissions
    • Hudi read and write permissions
      • Read permission: Grant the read-only permission of Hudi to the IAM user or user group of DataArts Migration through a system policy such as MRS ReadOnlyAccess. You can also create a custom policy to grant read permissions such as SELECT.
      • Write permission: In addition to the preceding OBS permissions, you need to grant the write permission of Hudi to the IAM user or user group of DataArts Migration through a system policy such as MRS CommonOperations and MRS FullAccess. You can also create a custom policy to grant write permissions such as INSERT INTO TABLE and CREATE TABLE.
      • OBS permissions (storage and compute analysis): When storage and compute analysis is enabled for MRS Hudi, DataArts Migration reads files from and writes files to OBS. In this case, you must have the read and write permissions on OBS files.
    • Enabling access ports: When configuring the MRS Hudi data source, ensure that the following ports have been enabled in the security group or network so that DataArts Migration can access MRS.
      Table 1 Service ports

      Service

      Port Type

      Port Number

      Usage

      Spark

      TCP

      22550

      Spark JDBC Thrift port, which is used for the communications between the Spark client and Spark server.

      MRS Manager

      TCP

      28443

      Used to download the MRS cluster configuration

      20009

      Used for CAS authentication

      20029

      Used by Manager to communicate with and manage other components

      KDC

      TCP&&UDP

      21730

      Used for Kerberos authentication

      21731

      Used for Kerberos authentication

      21732

      Used for Kerberos authentication

      HDFS

      TCP

      8020

      HDFS NameNode service port

      9866

      HDFS dataNode service port

      Hudi

      TCP

      10000

      HudiServer port used for communications between the client and HudiServer

      9083

      Hudi Metastore service port used to store and manage Hudi metadata

      Zookeeper

      TCP

      2181

      ZooKeeper service port used for communications between the client and ZooKeeper cluster

Supported Data Types

DataArts Migration can read the following types of fields from Hudi based on the Spark read capability..

Category

Field Type

MRS Hudi Read

MRS Hudi Write

String

STRING

Integer

TINYINT

INTEGER

SMALLINT

INT

BIGINT

Floating point

FLOAT

DOUBLE

DECIMAL

Date/Time

TIMESTAMP

DATE

Boolean

BOOLEAN

Binary

BINARY

Complex type

ARRAY

MAP

STRUCT

x

x

Supported Migration Scenarios

DataArts Migration supports the following offline synchronization modes:

  • Single table synchronization

    DataArts Migration supports table/file synchronization in data ingestion into a data lake or data migration to the cloud.

  • Database and table shard synchronization

    DataArts Migration supports synchronization of data from multiple databases and tables in data ingestion into a data lake or data migration to the cloud.

  • Entire DB migration

    DataArts Migrations supports synchronization of data from an on-premises database in data ingestion into a data lake or data migration to the cloud.

Database and table shard synchronization and entire DB migration are not supported in all regions. The following table lists the supported Hudi migration scenarios.

Supported Migration Scenario

Single Table Read

Single Table Write

Database/Table Shard Read

Database/Table Shard Write

Entire DB Read

Entire DB Write

Supported

x

x

x

Core Capabilities

  • Connection configuration

    Configuration Item

    Supported

    Description

    Kerberos authentication

    Kerberos authentication is used to access MRS clusters.

    Storage-compute decoupling

    The storage-compute decoupling architecture is supported, and data can be read from different Hudi storage file systems, such as OBS and HDFS.

  • Read capabilities

    Configuration Item

    Supported

    Description

    Incremental read

    Incremental read can be configured using where clauses. Data can be filtered by condition so that only new or modified data is read, avoiding full read and improving efficiency.

    Shard concurrency

    Underlying files are divided into multiple shards for parallel reading, making full use of resources and improving read performance. This feature is especially suitable for large datasets.

    Custom fields

    x

    You can add computed columns, constant columns, or masking functions for tasks to meet personalized service requirements. Currently, this function is not supported.

    Dirty data processing

    Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.

  • Write capabilities

    Configuration Item

    Supported

    Description

    Write Mode

    LOAD

    TRUNCATE+LOAD

    INSERT_OVERWRITE

    LOAD: This mode appends new data to the target table without modifying or deleting any existing data.

    TRUNCATE+LOAD: All data in the target partition is cleared before new data is written to the partition. This mode is suitable for replacing all data in a partition.

    INSERT_OVERWRITE: This mode updates or replaces the data in a partition based on specified conditions or rules.

    Pre- and post-import processing

    Partitions can be cleared in TRUNCATE+LOAD mode.

    Dirty data processing

    x

    Abnormal data cannot be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.

    Concurrent write

    Concurrent write can fully utilize cluster resources to improve the data write speed.

    Creating a table in the runtime state

    Dynamically create a table during data writing. If there is no destination table, Hudi automatically creates a table structure based on the written data. You do not need to manually create a table in advance.

    Creating a table in the editing state

    Manually create a table when editing a job. You can define the table structure, field type, and partition policy based on the data structure and requirements.

Creating a Data Source

Create a data source in Management Center. For details, see Configuring Data Connection Parameters.

Creating an Offline Data Migration Job

Create an MRS Hudi migration job in DataArts Factory. For details, see Creating an Offline Processing Migration Job.