Help Center> Data Lake Insight> User Guide> Enhanced Datasource Connections> Creating an Enhanced Datasource Connection
Updated on 2024-07-11 GMT+08:00

Creating an Enhanced Datasource Connection

Scenario

Create an enhanced datasource connection for DLI to access, import, query, and analyze data of other data sources.

For example, to connect DLI to the MRS, RDS, CSS, Kafka, or GaussDB(DWS) data source, you need to enable the network between DLI and the VPC of the data source.

Create an enhanced datasource connection on the console.

Constraints

  • Datasource connections cannot be created for the default queue.
  • Flink jobs can directly access DIS, OBS, and SMN data sources without using datasource connections.
  • Enhanced connections can only be created for yearly/monthly and pay-per-use queues.
  • VPC Administrator permissions are required for enhanced connections to use VPCs, subnets, routes, VPC peering connections.

    You can set these permissions by referring to Service Authorization.

  • If you use an enhanced datasource connection, the CIDR block of the elastic resource pool or queue cannot overlap with that of the data source.
  • Only queues bound with datasource connections can access datasource tables.
  • Datasource tables do not support the preview function.
  • When checking the connectivity of datasource connections, the notes and constraints on IP addresses are:
    • The IP address must be valid, which consists of four decimal numbers separated by periods (.). The value ranges from 0 to 255.
    • During the test, you can add a port after the IP address and separate them with colons (:). The port can contain a maximum of five digits. The value ranges from 0 to 65535.

      For example, 192.168.xx.xx or 192.168.xx.xx:8181.

  • When checking the connectivity of datasource connections, the notes and constraints on domain names are:
    • The domain name can contain 1 to 255 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed.
    • The top-level domain name must contain at least two letters, for example, .com, .net, and .cn.
    • During the test, you can add a port after the domain name and separate them with colons (:). The port can contain a maximum of five digits. The value ranges from 0 to 65535.

      For example, example.com:8080.

Process

Figure 1 Enhanced datasource connection creation flowchart

Prerequisites

  • An elastic resource pool or queue has been created.
  • You have obtained the VPC, subnet, private IP address, port, and security group information of the external data source.
  • The security group of the external data source has allowed access from the CIDR block of the elastic resource pool or queue.

Procedure

  1. Create an Enhanced Datasource Connection

    1. Log in to the DLI management console.
    2. In the left navigation pane, choose Datasource Connections.
    3. On the Enhanced tab page displayed, click Create.

      Configure parameters according to Table 1.

      Table 1 Parameters

      Parameter

      Description

      Connection Name

      Name of the created datasource connection.

      • The name can contain only letters, digits, and underscores (_). The parameter must be specified.
      • A maximum of 64 characters are allowed.

      Resource Pool

      It binds an elastic resource pool or queue that uses a datasource connection. This parameter is optional.

      Only dedicated queues charged in yearly/monthly or pay-per-use billing mode can be bound to elastic resource pools.

      In regions where this function is available, an elastic resource pool with the same name is created by default for the queue created in "Creating a Queue."

      NOTE:

      Before using an enhanced datasource connection, you must bind a queue and ensure that the VPC peering connection is in the Active state.

      Bind Queue

      It binds a queue that requires datasource connections. This parameter is optional.

      Only dedicated queues charged in yearly/monthly or pay-per-use billing mode can be bound to elastic resource pools.

      NOTE:

      Before using an enhanced datasource connection, you must bind a queue and ensure that the VPC peering connection is in the Active state.

      VPC

      VPC used by the data source.

      Subnet

      Subnet used by the data source.

      Route Table

      Route table of the subnet.

      NOTE:
      • The route table is associated with the subnet used by the destination data source, which is not the table containing the route you add by Manage Route in the Operation column. The route you add on the Manage Route page is contained in the route table associated with the subnet used by the queue to be bound.
      • The subnet used by the destination data source must be different from that used by the queue to be bound. Otherwise, a segment conflict occurs.

      Host Information

      In this text field, you can configure the mapping between host IP addresses and domain names so that jobs can only use the configured domain names to access corresponding hosts. This parameter is optional.

      For example, when accessing the HBase cluster of MRS, you need to configure the host name (domain name) and IP address of the ZooKeeper instance. Enter one record in each line in the format of IP address Host name/Domain name.

      Example:

      192.168.0.22 node-masterxxx1.com

      192.168.0.23 node-masterxxx2.com

      For details about how to obtain host information, see How Do I Obtain MRS Host Information?.

      Tags

      Tags used to identify cloud resources. A tag includes the tag key and tag value. If you want to use the same tag to identify multiple cloud resources, that is, to select the same tag from the drop-down list box for all services, you are advised to create predefined tags on the Tag Management Service (TMS).

      If your organization has configured tag policies for DLI, add tags to resources based on the policies. If a tag does not comply with the tag policies, resource creation may fail. Contact your organization administrator to learn more about tag policies.

      For details, see Tag Management Service User Guide.

      NOTE:
      • A maximum of 20 tags can be added.
      • Only one tag value can be added to a tag key.
      • The key name in each resource must be unique.
      • Tag key: Enter a tag key name in the text box.
        NOTE:

        A tag key can contain a maximum of 128 characters. Only letters, digits, spaces, and special characters (_.:=+-@) are allowed, but the value cannot start or end with a space or start with _sys_.

      • Tag value: Enter a tag value in the text box.
        NOTE:

        A tag value can contain a maximum of 255 characters. Only letters, digits, spaces, and special characters (_.:=+-@) are allowed. The value cannot start or end with a space.

    4. Click OK.

      After the creation is complete, the enhanced datasource connection is in the Active state, indicating that the connection is successfully created.

  2. Security Group Where the Data Source Belongs Allows Access from the CIDR Block of the Elastic Resource Pool

    1. On the DLI management console, obtain the network segment of the elastic resource pool or queue.

      Choose Resources > Queue Management from the left navigation pane. On the page displayed, locate the queue on which jobs are running, and click the button next to the queue name to obtain the CIDR block of the queue.

    2. Log in to the VPC console and find the VPC the data source belongs to.
    3. On the network console, choose Virtual Private Cloud > Network Interfaces. On the Network Interfaces tab page displayed, search for the security group name, click More in the Operation column, and select Change Security Group.
    4. In the navigation pane on the left, choose Access Control > Security Groups.
    5. Click the name of the security group to which the external data source belongs.
    6. Click the Inbound Rules tab and add a rule to allow access from the CIDR block of the queue. See Figure 2.

      Configure the inbound rule parameters according to Table 2.

      Figure 2 Adding an inbound rule
      Table 2 Inbound rule parameters

      Parameter

      Description

      Example Value

      Priority

      Priority of a security group rule.

      The priority value ranges from 1 to 100. The default value is 1, indicating the highest priority. A smaller value indicates a higher priority of a security group rule.

      1

      Action

      Action of the security group rule.

      Allow

      Protocol & Port

      • Network protocol. The value can be All, TCP, UDP, ICMP, or GRE.
      • Port: Port or port range over which the traffic can reach your instance. The port ranges from 1 to 65535.

      In this example, select TCP. Leave the port blank or set it to the data source port.

      Type

      Type of IP addresses.

      IPv4

      Source

      Allows access from IP addresses or instances in another security group.

      In this example, enter the obtained queue CIDR block.

      Description

      Supplementary information about the security group rule. This parameter is optional.

      _

  3. Test the Connectivity Between the DLI Queue and the Data Source

    1. Obtain the private IP address and port number of the data source.

      Take the RDS data source as an example. On the Instances page, click the target DB instance. On the page displayed, locate the Connection Information pane and view the private IP address. In the Connection Information pane, locate the Database Port to view the port number of the RDS DB instance.

    2. In the navigation pane of the DLI management console, choose Resources > Queue Management.
    3. Locate the queue bound with the enhanced datasource connection, click More in the Operation column, and select Test Address Connectivity.
    4. Enter the data source connection address and port number to test the network connectivity.

      Format: IP address:Port number

      Before testing the connection, ensure that the security group of the external data source has allowed access from the CIDR block of the queue.

      Figure 3 Testing the network connectivity between the queue and the data source