MRS Hive
DataArts Migration supports the main versions of MRS Hive, meeting your data synchronization requirements in various deployment environments.
Preparation and Constraints
- Network requirements
The MRS Hive data source can communicate with CDM. This ensures smooth data transmission. For details, see Enabling Network Connectivity.
- Required permissions
- Hive read and write permissions
- Read permission: Grant the read-only permission of Hive to the IAM user or user group of DataArts Migration through a system policy such as MRS ReadOnlyAccess. You can also create a custom policy to grant read permissions such as SELECT.
- Write permission: In addition to the preceding OBS permissions, you need to grant the write permission of Hive to the IAM user or user group of DataArts Migration through a system policy such as MRS CommonOperations and MRS FullAccess. You can also create a custom policy to grant write permissions such as INSERT INTO TABLE and CREATE TABLE.
- OBS permissions (storage and compute analysis): When storage and compute analysis is enabled for MRS Hive, DataArts Migration reads files from and writes files to OBS. In this case, you must have the read and write permissions on OBS files.
- Enabling access ports: When configuring the MRS Hive data source, ensure that the following ports have been enabled in the security group or network so that DataArts Migration can access MRS.
Table 1 Service ports Service
Port Type
Port Number
Usage
MRS Manager
TCP
28443
Used to download the MRS cluster configuration
20009
Used for CAS authentication
20029
Used by Manager to communicate with and manage other components
KDC
TCP&&UDP
21730
Used for Kerberos authentication
21731
Used for Kerberos authentication
21732
Used for Kerberos authentication
HDFS
TCP
8020
HDFS NameNode service port
9866
HDFS dataNode service port
Hive
TCP
10000
HiveServer port used for communications between the client and HiveServer
9083
Hive Metastore service port used to store and manage Hive metadata
Zookeeper
TCP
2181
ZooKeeper service port used for communications between the client and ZooKeeper cluster
- Hive read and write permissions
Supported Data Types
DataArts Migration supports the following field types and their common variants in MRS Hive. This ensures that DataArts Migration can correctly read and write data.
| Category | Field Type | MRS Hive Read | MRS Hive Write |
|---|---|---|---|
| String | CHAR | √ | √ |
| VARCHAR | √ | √ | |
| STRING | √ | √ | |
| Integer | TINYINT | √ | √ |
| SMALLINT | √ | √ | |
| INT | √ | √ | |
| INTEGER | √ | √ | |
| BIGINT | √ | √ | |
| Floating point | FLOAT | √ | √ |
| DOUBLE | √ | √ | |
| DECIMAL | √ | √ | |
| Date/Time | TIMESTAMP | √ | √ |
| DATE | √ | √ | |
| Boolean | BOOLEAN | √ | √ |
| Binary | BINARY | √ | √ |
| Complex type | ARRAY | √ | √ |
| MAP | √ | √ | |
| STRUCT | x | x | |
| UNIONTYPE | x | x |
Supported Migration Scenarios
DataArts Migration supports the following offline synchronization modes:
- Single table synchronization
DataArts Migration supports table/file synchronization in data ingestion into a data lake or data migration to the cloud.
- Database and table shard synchronization
DataArts Migration supports synchronization of data from multiple databases and tables in data ingestion into a data lake or data migration to the cloud.
- Entire DB migration
DataArts Migrations supports synchronization of data from an on-premises database in data ingestion into a data lake or data migration to the cloud.
Database and table shard synchronization and entire DB migration are not supported in all regions. The following table lists the supported Hive migration scenarios.
| Supported Migration Scenario | Single Table Read | Single Table Write | Database/Table Shard Read | Database/Table Shard Write | Entire DB Read | Entire DB Write |
|---|---|---|---|---|---|---|
| Supported | √ | √ | x | √ | x | √ (supported in some regions) |
Core Capabilities
- Connection configuration
Configuration Item
Supported
Description
Kerberos authentication
√
Kerberos authentication is used to access MRS clusters.
Storage-compute decoupling
√
The storage-compute decoupling architecture is supported, and data can be read from different Hive storage file systems, such as OBS and HDFS.
- Read capabilities
Configuration Item
Supported
Description
Read mode
JDBC/HDFS
HDFS files can be read through JDBC or directly. JDBC is suitable for interactive query and can flexibly read data using SQL syntax. When there is a large amount of data, directly reading the data and skipping SQL parsing is more efficient.
Shard concurrency
√
Horizontal sharding and multi-thread concurrent extraction significantly improve the throughput and efficiency. Currently, files can be concurrently read only from the HDFS.
Custom fields
x
You can add computed columns, constant columns, or masking functions for tasks to meet personalized service requirements. Currently, this function is not supported.
Dirty data processing
√
Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
Incremental read
√
Incremental read can be read through partition filtering or SQL statements.
- Write capabilities
Configuration Item
Supported
Description
Write mode
Insert into/Insert overwrite
Two write modes are supported: insert into and insert overwrite. Insert into appends data to the target table, which is applicable to incremental data writing. Insert overwrite overwrites data in the target table or partition, which is applicable to full data update.
Pre- and post-import processing
√
Partitions can be cleared in truncate mode.
Dirty data processing
x
Abnormal data cannot be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
Concurrent write
√
Concurrent write can fully utilize cluster resources to improve the data write speed.
Table creation at runtime
√
A table can be created during data writing. If there is no destination table, Hive automatically creates a table structure based on the written data. You do not need to manually create a table in advance.
Creating a Data Source
Create a data source in Management Center. For details, see Configuring Data Connection Parameters.
Creating an Offline Data Migration Job
Create an MRS Hive migration job in DataArts Factory. For details, see Creating an Offline Processing Migration Job.
Best Practices
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot