Updated on 2022-02-22 GMT+08:00

Creating a Data Connection

A data connection is storage space used to save data entities managed by Data Development, along with their connection information. With just one data connection, you can run multiple jobs and develop multiple scripts. If the connection information saved in the data connection changes, you only need to modify the corresponding information in Connection Management.

The following types of data connections can be created:

  • DLI
  • DWS
  • MRS Hive
  • MRS SparkSQL
  • RDS

Prerequisites

  • The corresponding cloud service has been enabled.

    For example, before creating an RDS data connection, you need to create a database instance in RDS.

  • The quantity of data connections is less than the maximum quota (20).

Procedure

  1. Choose either of the entrances to create a data connection: Connection Management page and area on the right.

    • Connection Management page
      1. In the navigation tree of the Data Development console, choose Connection > Connection Management.
      2. In the upper right corner of the page, click Create Data Connection.
    • Area on the right
      1. In the navigation tree of the Data Development console, choose Data Development > Develop Script/Data Development > Develop Job.
      2. Create a data connection in the area on the right using one of the following three methods:

        Method 1: Click Create Data Connection.

        Figure 1 Creating a data connection (method 1)

        Method 2: In the menu on the left, click , right-click root directory Data Connection, and choose Create Data Connection.

        Figure 2 Creating a data connection (method 2)

        Method 3: Open a script or job, click , and choose Create Data Connection.

        Figure 3 Creating a data connection (method 3)

  2. In the displayed dialog box, select a data connection type and configure data connection parameters. Table 1 describes the data connection parameters.

    Table 1 Data connection parameters

    Data Connection Type

    Parameter

    Description

    DLI

    For details, see Table 2.

    Only one DLI data connection can be created.

    DWS

    For details, see Table 3.

    -

    MRS Hive

    For details, see Table 4.

    -

    MRS SparkSQL

    For details, see Table 5.

    -

    RDS

    For details, see Table 6.

    -

  3. Click Test to test connectivity to the data connection. If the connectivity is verified, the data connection has been successfully created.
  4. Click OK.

Parameter Description

Table 2 DLI data connection

Parameter

Mandatory

Description

Data Connection Name

Yes

Name of the data connection to be created. Must consist of 1 to 100 characters and contain only letters, digits, and underscores (_).

Table 3 DWS data connection

Parameter

Mandatory

Description

Data Connection Name

Yes

Name of the data connection to be created. Must consist of 1 to 100 characters and contain only letters, digits, and underscores (_).

Cluster Name

No

Name of the DWS cluster. If you do not select a DWS cluster, then configure the access address and port number.

Access Address

Yes/No

IP address for accessing the DWS cluster.

  • If you select the DWS cluster in the cluster name, the system automatically sets this parameter to the access address of the DWS cluster.
  • If the DWS cluster is not selected, you need to enter the DWS cluster access address.

Port

Yes/No

Port for accessing the DWS cluster.

  • If you select the DWS cluster in the cluster name, the system automatically sets this parameter to the port of the DWS cluster.
  • If the DWS cluster is not selected, you need to enter the port of the DWS cluster.

Username

Yes

Administrator name for logging in to the DWS cluster.

Password

Yes

Administrator password for logging in to the DWS cluster.

SSL Connection

Yes/No

DWS supports connections in SSL authentication mode so that data transmitted between the DWS client and the database can be encrypted. The SSL connection mode delivers a higher security than the common mode. For security purposes, you are advised to enable SSL connection.

KMS Key

Yes

Key created on Key Management Service (KMS) and used for encrypting and decrypting user passwords and key pairs. You can select a created key from KMS.

Agent

Yes

Data Warehouse Service (DWS) is not a fully managed service and thus cannot be directly connected to Data Development. A CDM cluster can provide an agent for Data Development to communicate with non-fully-managed services. Therefore, you need to select a CDM cluster when creating a DWS data connection. If no CDM cluster is available, create one.

Table 4 MRS Hive data connection

Parameter

Mandatory

Description

Data Connection Name

Yes

Name of the data connection to be created. Must consist of 1 to 100 characters and contain only letters, digits, and underscores (_).

Cluster Name

Yes

Name of the MRS cluster. Select the MRS cluster to which Hive belongs.

Connection Mode

Yes

Select the mode for DLF to connect to MRS.

Proxy Connection

Use the communication proxy function of the CDM cluster to connect DLF to MRS. This mode is recommended.

If you select this mode, configure the following parameters:

  • Username (optional): administrator of MRS. The username does not need to be configured for some MRS clusters.
  • Password (optional): administrator password of MRS. The username does not need to be configured for some MRS clusters.
  • KMS Key (optional): used to encrypt and decrypt the passwords of user passwords and key pairs. Select a key created in KMS.
  • Connection Proxy (mandatory): Select an available CDM cluster.

Direct Connection

If you select this mode, the Hive data tables and fields cannot be viewed. When the Hive SQL script is developed online, the execution result can be viewed only in logs.

Table 5 MRS SparkSQL data connection

Parameter

Mandatory

Description

Data Connection Name

Yes

Name of the data connection to be created. Must consist of 1 to 100 characters and contain only letters, digits, and underscores (_).

Cluster Name

Yes

Name of the MRS cluster. Select the MRS cluster to which SparkSQL belongs.

Connection Mode

Yes

Select the mode for DLF to connect to MRS.

Proxy Connection

Use the communication proxy function of the CDM cluster to connect DLF to MRS. This mode is recommended.

If you select this mode, configure the following parameters:

  • Username (optional): administrator of MRS. The username does not need to be configured for some MRS clusters.
  • Password (optional): administrator password of MRS. The username does not need to be configured for some MRS clusters.
  • KMS Key (optional): used to encrypt and decrypt the passwords of user passwords and key pairs. Select a key created in KMS.
  • Connection Proxy (mandatory): Select an available CDM cluster.

Direct Connection

If you select this mode, the Hive data tables and fields cannot be viewed. When the SparkSQL script is developed online, the execution result can be viewed only in logs.

Table 6 RDS data connection

Parameter

Mandatory

Description

Data Connection Name

Yes

Name of the data connection to be created. Must consist of 1 to 100 characters and contain only letters, digits, and underscores (_).

IP Address

Yes

IP address for logging in to the RDS instance.

Port

Yes

Port for logging in to the RDS instance.

Driver Name

Yes

Name of the driver. Possible values:

  • com.mysql.jdbc.Driver
  • org.postgresql.Driver

Username

Yes

Username for logging in to the RDS instance. Default value: root

Password

Yes

Password for logging in to the RDS instance.

KMS Key

Yes

Key created on Key Management Service (KMS) and used for encrypting and decrypting user passwords and key pairs. You can select a created key from KMS.

Driver Path

Yes

Path to the JDBC driver.

Download the JDBC driver from the MySQL and PostgreSQL official websites as required and upload the JDBC driver to the Object Storage Service (OBS) bucket.

  • If Driver Name is set to com.mysql.jdbc.Driver, use the mysql-connector-java-5.1.21.jar driver.
  • If Driver Name is set to org.postgresql.Driver, use the postgresql-42.2.2.jar driver.

Agent

Yes

Relational Database Service (RDS) is not a fully managed service and thus cannot be directly connected to Data Development. A CDM cluster can provide an agent for Data Development to communicate with non-fully-managed services. Therefore, you need to select a CDM cluster when creating an RDS data connection. If no CDM cluster is available, create one.