Updated on 2024-09-30 GMT+08:00

Overview

What Is Enhanced Datasource Connection?

In cross-source data analysis scenarios, DLI needs to connect to external data sources. However, due to the different VPCs between the data source and DLI, the network cannot be connected, which results in DLI being unable to read data from the data source. DLI's enhanced datasource connection feature enables network connectivity between DLI and the data source.

This section will introduce a solution for cross-VPC data source network connectivity:

  • Creating an enhanced datasource connection: Establish a VPC peering connection to connect DLI and the data source's VPC network.
  • Testing network connectivity: Verify the connectivity between the queue and the data source's network.

For details about the data sources that support cross-source access, see Common Development Methods for DLI Cross-Source Analysis.

In cross-source development scenarios, there is a risk of password leakage if datasource authentication information is directly configured. You are advised to use Data Encryption Workshop (DEW) to store authentication information of data sources when Spark 3.3.1 or later and Flink 1.15 or later jobs access data sources using datasource connections. This will help you address issues related to data security, key security, and complex key management. For details, see Using DEW to Manage Access Credentials for Data Sources.

Notes and Constraints

  • Datasource connections cannot be created for the default queue.
  • Flink jobs can directly access DIS, OBS, and SMN data sources without using datasource connections.
  • Enhanced connections can only be created for yearly/monthly and pay-per-use queues.
  • VPC Administrator permissions are required for enhanced connections to use VPCs, subnets, routes, VPC peering connections.

    You can set these permissions by referring to Service Authorization.

  • If you use an enhanced datasource connection, the CIDR block of the elastic resource pool or queue cannot overlap with that of the data source.
  • Only queues bound with datasource connections can access datasource tables.
  • Datasource tables do not support the preview function.
  • When checking the connectivity of datasource connections, the notes and constraints on IP addresses are:
    • The IP address must be valid, which consists of four decimal numbers separated by periods (.). The value ranges from 0 to 255.
    • During the test, you can add a port after the IP address and separate them with colons (:). The port can contain a maximum of five digits. The value ranges from 0 to 65535.

      For example, 192.168.xx.xx or 192.168.xx.xx:8181.

  • When checking the connectivity of datasource connections, the notes and constraints on domain names are:
    • The domain name can contain 1 to 255 characters. Only letters, numbers, underscores (_), and hyphens (-) are allowed.
    • The top-level domain name must contain at least two letters, for example, .com, .net, and .cn.
    • During the test, you can add a port after the domain name and separate them with colons (:). The port can contain a maximum of five digits. The value ranges from 0 to 65535.

      For example, example.com:8080.

Cross-Source Analysis Process

To use DLI for cross-source analysis, you need to create a datasource connection to connect DLI to the data source, and then develop jobs to access the data source.

Figure 1 Cross-source analysis flowchart

Helpful Links

Creation Method

Enhanced Datasource Connection

Console

Creating an Enhanced Datasource Connection

API

Creating an Enhanced Datasource Connection