Updated on 2022-06-09 GMT+08:00

Configuring Data Connections

MRS data connections are used to manage external source connections used by components in a cluster. For example, if Hive metadata uses an external relational database, a data connection can be used to associate the external relational database with the Hive component.

  • Local: Metadata is stored in the local GaussDB of a cluster. When the cluster is deleted, the metadata is also deleted. To retain the metadata, manually back up the metadata in the database in advance.
  • Data Connection: Metadata is stored in the associated PostgreSQL or MySQL database of the RDS service in the same VPC and subnet as the current cluster. When the cluster is terminated, the metadata is not deleted. Multiple MRS clusters can share the metadata.

When Hive metadata is switched between different clusters, MRS synchronizes only the permissions in the metadata database of the Hive component. The permission model on MRS is maintained on MRS Manager. Therefore, when Hive metadata is switched between clusters, the permissions of users or user groups cannot be automatically synchronized to MRS Manager of another cluster.

Performing Operations Before Data Connection

  1. Log in to the RDS console.
  2. Click the Instance Management tab and click the name of the RDS DB instance used by the MRS data connection.
  3. Click Log In in the upper right corner to log in to the instance as user root.

  4. On the home page of the instance, click Create Database to create a database.

  5. On the top of the page, choose Account Management > User Management.

    If the selected data connection is RDS MySQL database, ensure that the database user is user root. If the user is not root, perform 5 to 7.

  6. Click Create User to create a non-root user.

  7. On the top of the page, choose SQL Operations > SQL Query, switch to the target database by database name, and run the following SQL statements to grant permissions to the database user. In the following statements, ${db_name} and ${db_user} indicate the name of the database to be connected to MRS and the name of the new user, respectively.

    grant SELECT, INSERT on mysql.* to '${db_user}'@'%' with grant option;
    grant all privileges on ${db_name}.* to '${db_user}'@'%' with grant option;
    grant reload on *.* to '${db_user}'@'%' with grant option;
    flush privileges;

  8. Create a data connection by referring to Creating a Data Connection.

Creating a Data Connection

  1. Log in to the MRS management console, and choose Data Connections in the left navigation pane.
  2. Click Create Data Connection.
  3. Set parameters according to Table 1.

    Table 1 Data connection parameters

    Parameter

    Description

    Type

    Type of an external source connection.

    • RDS for PostgreSQL database. Clusters of that support Hive can connect to this type of database.
    • RDS for MySQL database. Clusters of that supports Hive or Ranger can connect to this type of database.

    Name

    Name of a data connection.

    RDS Instance

    RDS database instance. This instance must be created in RDS before being referenced here, and the database must have been created. For details, see Performing Operations Before Data Connection. Click View RDS Instance to view the created instances.

    NOTE:
    • To ensure network communications between the cluster and the PostgreSQL database, you are advised to create the instance in the same VPC and subnet as the cluster.
    • The inbound rule of the security group of the RDS instance must allow access of the instance to port 3306. To configure that, click the instance name on the RDS console to go to the instance management page. In Connection Information area, click the name of Security Group. On the page that is displayed, click the Inbound Rules tab, and click Add Rule. On the displayed dialog box, in Protocol & Port area, select TCP and enter port number 3306. In Source area, enter the IP address of all nodes where the MetaStore instance of Hive resides.
    • Currently, MRS supports PostgreSQL9.5/PostgreSQL9.6 on RDS.
    • Currently, MRS supports only MySQL 5.7.x on RDS.

    Database

    Name of the database to be connected to.

    Username

    Username for logging in to the database to be connected.

    Password

    Password for logging in to the database to be connected.

    If the selected data connection is an RDS MySQL database, ensure that the database user is a root user. If the user is not root, perform operations by referring to Performing Operations Before Data Connection.

  4. Click OK.

Editing a Data Connection

  1. Log in to the MRS management console, and choose Data Connections in the left navigation pane.
  2. In the Operation column of the data connection list, click Edit in the row where the data connection to be edited is located.
  3. Modify parameters according to Table 1.

    If the selected data connection has been associated with a cluster, the configuration changes will be synchronized to the cluster.

Deleting a Data Connection

  1. Log in to the MRS management console, and choose Data Connections in the left navigation pane.
  2. In the Operation column of the data connection list, click Delete in the row where the data connection to be deleted is located.

    If the selected data connection has been associated with a cluster, the deletion does not affect the cluster.

Configuring a data connection during cluster creation

  1. Log in to the MRS management console.
  2. Click Create Cluster. The Create Cluster page is displayed.
  3. Click the Custom Config tab.
  4. In the software configuration area, set Metadata by referring to Table 2. For other parameters, see Creating a Custom Cluster for configuration and cluster creation.

    Table 2 Data connection parameters

    Parameter

    Description

    Metadata

    Whether to use external data sources to store metadata.

    • Local: Metadata is stored in the local cluster.
    • Data connections: Metadata of external data sources is used. If the cluster is abnormal or deleted, metadata is not affected. This mode applies to scenarios where storage and compute are decoupled.

    Clusters that support the Hive or Ranger component support this function.

    Component

    This parameter is valid only when Use External Data Sources to Store Metadata is enabled. It indicates the type of an external data source.

    • Hive
    • Ranger

    Data Connection Type

    This parameter is valid only when Use External Data Sources to Store Metadata is enabled. It indicates the type of an external data source.

    • Hive supports the following data connection types:
      • RDS PostgreSQL database (supported for clusters of MRS 1.9.x)
      • RDS MySQL database
      • Local database
    • Ranger supports the following data connection types:
      • RDS MySQL database
      • Local database

    Data Connection Instance

    This parameter is valid only when Data Connection Type is set to RDS PostgreSQL database or RDS MySQL database. This parameter indicates the name of the connection between the MRS cluster and the RDS database. This instance must be created before being referenced here. You can click Create Data Connection to create a data connection. For details, see Performing Operations Before Data Connection and Creating a Data Connection.

Managing Data Connections in an Existing Cluster

This function is not supported in MRS 3.0.5.

  1. Log in to the MRS management console. In the left navigation pane, choose Clusters > Active Clusters.
  2. Click the name of the cluster to enter its details page.
  3. On the Dashboard tab page of the cluster details page, click Manage next to Data Connection.
  4. On the Data Connection dialog box, the data connections associated with the cluster are displayed. You can click Edit or Delete to edit or delete the data connections.
  5. If there is no associated data connection on the Data Connection page, click Configure Data Connection to add a connection.

    Only one data connection can be configured for a module type. For example, after a data connection is configured for Hive metadata, no other data connection can be configured for it. If no module type is available, the Configure Data Connection button is unavailable.

    Table 3 Parameters for configuring a data connection

    Parameter

    Description

    Component Name

    • Hive
    • Ranger

    Module Type

    If Component Name is set to Hive, Hive metadata is supported.

    When the Component Name is Ranger, Ranger metadata is supported.

    Data Connection Type

    • Hive supports the following data connection types:
      • RDS PostgreSQL database
      • RDS MySQL database
      • Local database
    • Ranger supports the following data connection types:
      • RDS MySQL database
      • Local database

    Instance

    This parameter is valid only when Data Connection Type is set to RDS PostgreSQL database or RDS MySQL database. Select the name of the connection between the MRS cluster and the RDS database. This instance must be created before being referenced here. You can click Create Data Connection to create a data connection. For details, see Creating a Data Connection.

  6. Click Test to test connectivity of the data connection.
  7. After the data connection is successful, click OK.

    After Hive/Ranger metadata is configured, restart Hive/Ranger. Hive/Ranger will create necessary database tables in the specified database. (If tables exist, they will not be created.)