Updated on 2024-05-29 GMT+08:00

Practices

You can better use DLI for big data analytics and processing by following the scenario-specific instructions and best practices provided in this section.

Table 1 Common DLI development instructions and best practices

Scenario

Instructions

Description

Connecting a queue to an external data source

Configuring the Connection Between a DLI Queue and a Data Source in a Private Network

When creating and running a job on a DLI queue, you need to connect the DLI queue to external data sources. This section describes how to connect DLI queues to external data sources. For example, to connect a DLI queue to MRS, RDS, CSS, Kafka, or GaussDB(DWS), you need to configure the connection between the queue and the external data source.

Configuring the Connection Between a DLI Queue and a Data Source in the Internet

Connect a DLI queue to a data source on the Internet. You can configure SNAT rules and add routes to the public network to enable communications between a queue and the Internet.

Spark SQL job development

Using Spark SQL Jobs to Analyze OBS Data

Use a Spark SQL job to create OBS tables, and import, insert, and query OBS table data.

Flink OpenSource SQL job development

Reading Data from Kafka and Writing Data to RDS

Use a Flink OpenSource SQL job to read data from Kafka and write the data to RDS.

Reading Data from Kafka and Writing Data to GaussDB(DWS)

Use a Flink OpenSource SQL job to read data from Kafka and write the data to GaussDB(DWS).

Reading Data from Kafka and Writing Data to Elasticsearch

Use a Flink OpenSource SQL job to read data from Kafka and write the data to Elasticsearch.

Reading Data from MySQL CDC and Writing Data to GaussDB(DWS)

Use a Flink OpenSource SQL job to read data from MySQL CDC and write the data to GaussDB(DWS).

Reading Data from PostgreSQL CDC and Writing Data to GaussDB(DWS)

Use a Flink OpenSource SQL job to read data from PostgreSQL CDC and write the data to GaussDB(DWS).

Flink Jar job development

Flink Jar Job Examples

Create a custom Flink Jar job to interact with MRS.

Using Flink Jar to Write Data to OBS

Write Kafka data to OBS.

Using Flink Jar to Connect to Kafka with SASL_SSL Authentication Enabled

Use Flink OpenSource SQL to connect to Kafka with SASL_SSL authentication enabled.

Spark Jar job development

Using Spark Jar Jobs to Read and Query OBS Data

Write a Spark program to read and query OBS data, compile and package your code, and submit a Spark Jar job.

Data migration

Migrating Data from Hive to DLI

Migrate data from MRS Hive to DLI using the CDM data synchronization function.

Migrating Data from Kafka to DLI

Migrate data from MRS Kafka to DLI using the CDM data synchronization function.

Migrating Data from Elasticsearch to DLI

Migrate data from a CSS Elasticsearch cluster to DLI using the CDM data synchronization function.

Migrating Data from RDS to DLI

Migrate data from an RDS database to DLI using the CDM data synchronization function.

Migrating Data from GaussDB(DWS) to DLI

Migrate data from GaussDB(DWS) to DLI using the CDM data synchronization function.