Updated on 2026-05-20 GMT+08:00

OBS

Huawei Cloud Object Storage Service (OBS) provides secure, reliable, and cost-effective cloud storage for massive amounts of data. It supports multiple data storage and access modes and applies to various scenarios, such as data backup and image and video storage.

DataArts Migration can migrate data from and to Huawei Cloud OBS.

Preparation and Constraints

  • Network requirements

    The OBS data source can communicate with CDM. This ensures smooth data transmission. For details, see Enabling Network Connectivity.

  • Required permissions
    • Read permission: The OBS ReadOnlyAccess system-defined policy is required for DataArts Migration to read data from OBS. This policy allows users to list buckets, obtain basic bucket information and bucket metadata, and list objects.
    • Write permission: The OBS OperateAccess or OBS Administrator system-defined policy is required for DataArts Migration to write data to OBS. Users with this permission can perform all OBS ReadOnlyAccess operations and perform basic operations on objects, such as uploading, downloading, and deleting objects, and obtaining object ACLs.

Supported Data Types

DataArts Migration supports OBS files of multiple formats. The following ORC and Parquet file formats are supported.

  • Supported Parquet file formats

    PrimitiveType

    Logical Type

    Read Support

    Write Support

    INT64

    INT_64

    UINT_64

    DECIMAL

    TIME_MICROS

    TIMESTAMP_MILLIS

    TIMESTAMP_MICROS

    N/A

    INT32

    UINT_8

    UINT_16

    UINT_32

    INT_8

    INT_16

    INT_32

    DECIMAL

    DATE

    TIME_MILLIS

    N/A

    BOOLEAN

    N/A

    BINARY

    DECIMAL

    UTF8

    ENUM

    JSON

    BSON

    N/A

    FLOAT

    N/A

    DOUBLE

    N/A

    INT96

    N/A

    FIXED_LEN_BYTE_ARRAY

    DECIMAL

    INTERVAL

    GroupType

    LIST

    x

    x

    MAP

    x

    x

    STRUCT

    x

    x

  • Supported ORC file formats

    Category

    Field Type

    Read Support

    Write Support

    Numeric

    TINYINT

    SMALLINT

    INT

    BIGINT

    FLOAT

    DOUBLE

    DECIMAL

    Time

    TIMESTAMP

    DATE

    Character

    VARCHAR

    STRING

    CHAR

    Boolean

    BOOLEAN

    BINARY

    BINARY

    Complex type

    LIST

    x

    x

    MAP

    x

    x

    STRUCT

    x

    x

Supported Migration Scenarios

DataArts Migration supports the following offline synchronization modes:

  • Single table synchronization

    DataArts Migration supports table/file synchronization in data ingestion into a data lake or data migration to the cloud.

  • Database and table shard synchronization

    DataArts Migration supports synchronization of data from multiple databases and tables in data ingestion into a data lake or data migration to the cloud.

  • Entire DB migration

    DataArts Migrations supports synchronization of data from an on-premises database in data ingestion into a data lake or data migration to the cloud.

Database and table shard synchronization and entire DB migration are not supported in all regions. The following table lists the supported OBS migration scenarios.

Supported Migration Scenario

Single Table Read

Single Table Write

Database/Table Shard Read

Database/Table Shard Write

Entire DB Read

Entire DB Write

Supported

x

x

x

Core Capabilities

  • Connection configuration

    Configuration Item

    Supported

    Description

    Authentication Mode

    AK/SK, agency

    AK/SK and an IAM agency can both be used for authentication. (The IAM agency authorizes service roles to access OBS.)

  • Read capabilities

    Configuration Item

    Supported

    Description

    Incremental read

    You can configure the variable path and scheduling to trigger incremental synchronization based on time or file changes.

    Supported file formats

    Binary

    CSV

    JSON

    PARQUET

    ORC

    Raw binary files can be read. This is applicable to migration between file systems.

    The standard CSV format is supported. Delimiters and encoding modes can be identified.

    JSON structures can be parsed, and multiple JSON fields can be extracted.

    The columnar storage format Parquet is supported, and native Parquet files can be read.

    The columnar storage format ORC is supported, and native ORC files can be read.

    Shard concurrency

    Multiple threads can run concurrently to read data from files, significantly improving the throughput.

    Dirty data processing

    Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.

    Custom fields

    You can add computed columns, constant columns, or masking functions for tasks to meet personalized service requirements.

  • Write capabilities

    Configuration Item

    Supported

    Description

    Supported file formats

    Binary

    CSV

    JSON

    PARQUET

    ORC

    Raw binary files can be written. This is applicable to migration between file systems.

    The standard CSV format is supported. Delimiters and encoding modes can be identified.

    JSON structures can be parsed, and multiple JSON fields can be extracted.

    The columnar storage format Parquet is supported, and native Parquet files can be written.

    The columnar storage format ORC is supported, and native ORC files can be written.

    File compression

    ORC/Parquet files can be compressed and written.

    Dirty data processing

    x

    Abnormal data cannot be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.

Creating a Data Source

Create a data source in Management Center. For details, see Configuring Data Connection Parameters.

Creating an Offline Data Migration Job

Create an OBS migration job in DataArts Factory. For details, see Creating an Offline Processing Migration Job.