From MRS Kafka to Hudi
This connection is available only after you apply for the trustlist membership. To use it, contact customer service or technical support.
Procedure
- Configure source parameters.
Figure 1 Configuring source parameters
Kafka configuration- Source Connection Name: name of the data connection, which cannot be changed.
- Topic: Only one topic is allowed.
- Data Format: format of the message content in the source Kafka topic
JSON indicates that messages can be parsed in JSON format.
- Consumer Group ID: ID of the consumer group of the real-time processing integration job
After a migration job consumes messages of a topic in the DMS Kafka cluster, you can view the configured consumer group ID on the consumer group management page of the Kafka cluster and query the consumption attribute group.id on the message query page. Kafka regards the party that consumes messages as a consumer. Multiple consumers form a consumer group. A consumer group is a scalable and fault-tolerant consumer mechanism provided by Kafka. You are advised to configure a consumer group.
- Configure destination parameters.
Figure 2 Configuring destination parameters
- Basic Configuration:
- Destination Database: Select the destination database.
- Partitioning: Select Partitioned table or Non-partitioned table.
- Destination Table: Select the destination table.
- Data Storage Path: basic path for storing Hudi data. This path takes effect only for automatically created tables. A subdirectory is created in the basic path for each destination table. HDFS and OBS paths are supported.
- OBS format: obs://{Bucket name}
- HDFS format: /tmp
- Partitioning Mode: This parameter is displayed when Partition is set to Partitioned table. Select Dynamic partitioning based on the source field content or Auto partitioning based on the migration time.
- Partition Field Value Source: This parameter is displayed when Partitioning is set to Partitioned table and Partitioning Mode is set to Dynamic partitioning based on the source field content. The following options are available: __key__, __value__, _topic__, __partition__, __offset__, or _timestamp__.
- Partition Field: This parameter is displayed when Partition is set to Partitioned table. This parameter is automatically set after the destination table is selected.
- Partition Field Type: This parameter is displayed when Partition is set to Partitioned table. Select a partition field type, enumeration or time.
A partition is created for each value in the partition field. A maximum of 1,000 partitions can be created. If 1,000 partitions have been created, no more partitions can be created, and real-time tasks will fail.
- Global Configuration of Hudi Table Attributes: Click View and Edit to configure the global configuration of Hudi table attributes.
The attributes configured here apply to all Hudi tables. For details about Hudi configuration items, visit the Hudi official website.
If an attribute is configured for a specific table, the attribute value set here will be overwritten.
- Target Format: This parameter is displayed when Partition Field Type is set to Time. Set the value format of the partition field.
The system attempts to convert the source field data to a standard timestamp and writes the data as a string of the format required by the destination. If the conversion fails, the data is regarded as dirty data.
- yy: two-digit year, for example, 85, 91, and 20.
- yyyy: four-digit year, for example, 1985, 1991, and 2020.
- MM: two-digit month, for example, 01, 05, and 12.
- dd: two-digit date, for example, 02, 15, and 26.
- HH: two-digit hour in 24-hour format, for example, 00, 03, 17, and 21.
- mm: two-digit minute, for example, 01, 18, 36, and 59.
- ss: two-digit second, for example, 02, 16, 25, and 51.
- Mapping Between Source and Destination Tables: You can select a synchronization primary key as needed and select a unique table field as the primary key of the destination table.
Assign Value to Destination Field: By default, real-time synchronization maps fields with the same name in the source and destination. Fields that fail to be mapped cannot be synchronized. You can add fields to destination tables and assign constants or variables to the fields.
The primary key must be set for Hudi tables. If the source table has no primary key, you must manually select the primary key during field mapping.
- Basic Configuration:
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot