Help Center/ Data Lake Insight/ Best Practices/ Connections/ Configuring the Connection Between a DLI Queue and a Data Source in a Private Network
Updated on 2024-09-20 GMT+08:00

Configuring the Connection Between a DLI Queue and a Data Source in a Private Network

Background

If you need to access external data sources such as MRS, RDS, CSS, Kafka, and GaussDB(DWS) when running DLI jobs, you need to establish network connectivity between DLI and the external data sources. DLI enhanced datasource connection uses VPC peering to directly connect the VPC networks of the destination data sources for point-to-point data exchanges.

This section provides a guide to help you connect to data sources. You can also refer to this section to rectify connection faults.

Development Process

Figure 1 Configuration process of an enhanced datasource connection

Prerequisites

  • The CIDR block of the DLI queue bound with a datasource connection cannot overlap with the CIDR block of other data sources.
  • Datasource connections cannot be bound with the default queue.

Step 1: Obtain the Floating IP Address, Port Number, and Security Group of an External Data Source

Table 2 Data source information to be obtained

Data Source

Obtain Method

DMS Kafka

  1. On the Kafka management console, click an instance name on the DMS for Kafka page. Basic information of the Kafka instance is displayed.
  2. In the Connection pane, obtain the Instance Address (Private Network) value. In the Network pane, obtain the VPC and subnet of the instance.
  3. In the Network pane, obtain the security group of the instance.

RDS

On the Instances page of the RDS console, click the target DB instance name. In the displayed page, locate the Connection Information pane and obtain the Floating IP Address, VPC, Subnet, Database Port, and Security Group.

CSS

  1. On the CSS management console, choose Clusters > Elasticsearch. On the displayed page, click the name of the created CSS cluster to view basic information.
  2. On the Cluster Information page, obtain the Private Network Address, VPC, Subnet, and Security Group.

GaussDB(DWS)

  1. On the GaussDB(DWS) management console, choose Clusters. On the displayed page, click the name of the created GaussDB(DWS) cluster to view basic information.
  2. On the Basic Information tab, locate the Database Attributes pane and obtain the private IP address and port number of the DB instance. In the Network pane, obtain the VPC, subnet, and security group information.

MRS HBase

An MRS 3.x cluster is used as an example.

  1. Log in to the MRS management console, click a cluster name on the Clusters > Active Clusters page to view basic information.
  2. On the dashboard, obtain VPC, subnet, and security group from the Basic Information pane.
  3. The ZooKeeper instance and its port of the MRS cluster are required for creating a job that connects DLI to MRS HBase. You need to obtain the host information of the MRS cluster.
    1. Log in to MRS Manager by referring to Accessing FusionInsight Manager. On MRS Manager, choose Cluster > Name of the desired cluster > Services > ZooKeeper. Click the Instance tab and obtain the ZooKeeper host information such as the host name and service IP address.
    2. On MRS Manager, choose Cluster and click the name of the desired cluster. Choose Services > ZooKeeper. Click the Configurations tab and select All Configurations, search for the clientPort parameter, and obtain its value, that is, the ZooKeeper port number.
    3. Log in to any MRS node as user root in SSH mode. For details, see Logging In to an ECS.
    4. Run the following command to obtain MRS hosts information. Copy and save the information.

      cat /etc/hosts

      An example query result is as follows:

Step 2: Obtain the CIDR Block of the DLI Queue

On the DLI management console, choose Resources > Queue Management from the navigation pane. Locate the queue you have created, and click next to the queue name to view the CIDR block of the queue.

Step 3: Add a Rule to the Security Group of the External Data Source to Allow Access from the DLI Queue

  1. Log in to the VPC console.
  2. In the navigation pane on the left, choose Access Control > Security Groups.
  3. Click the name of the security group to which the external data source belongs.

    To obtain the security group information, go to the management console of the data source service and follow the steps provided in Step 1: Obtain the Floating IP Address, Port Number, and Security Group of an External Data Source.

  4. In the Inbound Rules tab, add a rule to allow access from the queue network segment.

    For details about how to set the inbound rule parameters, see Table 3.

    Figure 2 Adding an inbound rule
    Table 3 Inbound rule parameters

    Parameter

    Description

    Example

    Priority

    The security group rule priority.

    The priority value ranges from 1 to 100. The default value is 1, indicating the highest priority. A smaller value indicates a higher priority of a security group rule.

    1

    Action

    Action of the security group rule.

    Select Allow.

    Protocol &Port

    • Network protocol: The value can be All, TCP, UDP, ICMP, or GRE.
    • Port: Port or port range over which the traffic can reach your instance. The port ranges from 1 to 65535.

    In this example, select TCP. Leave the port blank or set it to the data source port obtained in Step 1: Obtain the Floating IP Address, Port Number, and Security Group of an External Data Source.

    Type

    Type of IP addresses.

    IPv4

    Source

    Allow access from IP addresses or instances in another security group.

    In this example, enter the queue network segment obtained in Step 2: Obtain the CIDR Block of the DLI Queue.

    Description

    Supplementary information about the security group rule. This parameter is optional.

    _

Step 4: Create an Enhanced Datasource Connection

  1. Log in to the DLI management console. In the navigation pane on the left, choose Datasource Connections. On the displayed page, click Create in the Enhanced tab.
  2. In the displayed dialog box, set the following parameters:
  3. Click OK. Click the name of the created datasource connection to view its status. You can perform subsequent steps only after the connection status changes to Active.
  4. To connect to MRS HBase, you need to add MRS host information. The procedure is as follows:
    1. On the Datasource Connections page, click the Enhanced tab and locate the row that contains the created enhanced datasource connection. Click More > Modify Host in the Operation column.
    2. In displayed dialog box, enter the MRS HBase host information obtained in Step 1: Obtain the Floating IP Address, Port Number, and Security Group of an External Data Source to the Host Information box.
      Figure 3 Modifying host information
    3. Click OK.

Step 5: Test Network Connectivity

  1. Choose Resources > Queue Management from the left navigation pane, locate the target queue. In the Operation column, click More > Test Address Connectivity.
  2. In the displayed dialog box, enter the obtained IP address and port number of the data source in the address box, and click Test. If the queue passes the test, it can access the data source.

    For MRS HBase, use ZooKeeper IP address:ZooKeeper port or ZooKeeper host information:ZooKeeper port for the test.