Updated on 2024-10-25 GMT+08:00

Storing Hive Metadata to RDS

This section describes how to switch the Hive metadata of an active cluster to the metadata stored in a local database or RDS database. This operation enables MRS clusters to share the same metadata, retains the metadata when the cluster is deleted, and avoids Hive metadata migration during cluster migration.

Creating and Configuring an RDS DB Instance

  1. Log in to the RDS console and buy an RDS DB instance. For details, see Buying a DB Instance..

    • To ensure network communications between the cluster and the MySQL or PostgreSQL database, create the instance in the same VPC and subnet as the cluster.
    • Security group rules of the RDS DB instance must allow inbound access from MySQL (default port 3306) and PostgreSQL (default port 5432).

      For example, click the instance name on the RDS console to go to the instance management page. In the Connection Information area, click the name next to Security Group. On the page that is displayed, click the Inbound Rules tab, and click Add Rule. In the displayed Add Inbound Rule dialog box, in the Protocol & Port area, select TCP and enter port number 3306. In the Source area, select IP address and enter the IP addresses of all nodes where the MetaStore instances of Hive are located.

    • Ranger can interconnect with RDS for MySQL databases of the MySQL 5.7.x and 8.0 versions only.
    • Hive can interconnect with RDS for MySQL and PostgreSQL databases. The supported versions are MySQL 5.7.x and 8.0 and PostgreSQL14.

  2. In the navigation pane of the RDS management console, choose Instances. Locate the row containing the RDS DB instance used by MRS data connections, click More in the Operation column, and select Log In to log in to the DB instance as user root.

    Figure 1 Logging in to an RDS DB instance

  3. On the home page of the instance, click Create Database to create a database.

    If no new database is created, the MRS data connections will fail to configure.

    Figure 2 Creating a database

  4. On the top of the page, choose Account Management > User Management.

    • For clusters earlier than MRS 3.x, if the selected data connection is RDS MySQL database, ensure that the database user is root. If the user is not root, create a user and grant permissions to the user by referring to 4 to 6.
    • For MRS 3.x or later clusters, when Type is set to RDS MySQL database, Username must not be root. In this case, create a user and grant permissions to the user by referring to 4 to 6.

  5. Click Create User to create a non-root user and select all permissions listed in Global Permissions.

    If you are configuring an external RDS data connection for Ranger, you can select only the SELECT, INSERT, CREATE, RELOAD, CREATE USER, and GRANT permissions.

    Figure 3 Creating a user

  6. On the top of the page, choose SQL Operations > SQL Query, switch to the target database by database name, and run the following SQL statements to grant permissions to the database user. In the following statements, ${db_name} and ${db_user} indicate the name of the database to be connected to MRS and the name of the new user, respectively.

    grant all privileges on ${db_name}.* to '${db_user}'@'%' with grant option;
    grant reload on *.* to '${db_user}'@'%' with grant option;
    flush privileges;
    Figure 4 Assigning permissions to database users

Creating an RDS Data Connection for an Existing MRS Cluster

Perform the following steps to create an RDS data connection for an existing MRS cluster.

  1. Log in to the MRS management console, and choose Data Connections in the left navigation pane.
  2. Click Create Data Connection.
  3. Configure parameters according to Table 1.

    Table 1 Parameters for creating a data connection

    Parameter

    Description

    Type

    The type of an external source connection. Value options are as follows:

    • RDS PostgreSQL database. Clusters with Hive installed can connect to this type of database.
    • RDS MySQL database. Clusters with Hive or Ranger installed can connect to this type of database.

    Name

    The name of a data connection.

    Database Instance

    The RDS database instance. This instance must be created in RDS before being referenced here, and the database must have been created. For details, see Creating and Configuring an RDS DB Instance. Click View DB Instance to view the created DB instances.

    Database

    The name of the database to be connected to.

    Username

    The username for logging in to the database to be connected.

    Password

    The password for logging in to the database to be connected.

    If the selected data connection is an RDS MySQL database, ensure that the database user is a root user. If the user is not root, perform operations by referring to Creating and Configuring an RDS DB Instance.

  4. Click OK.

Configuring a Hive Data Connection

This function is not supported in MRS 3.0.5.

  1. Log in to the MRS console. In the navigation pane on the left, choose Active Clusters.
  2. Click the name of a cluster to go to the cluster details page.
  3. On the Dashboard tab page, click Manage next to Data Connection.
  4. On the Data Connections page, the data connections of the cluster are displayed. You can click Disassociate to delete a data connection.
  5. If there is no associated data connection on the Data Connection dialog box, click Configure Data Connection to add a connection.

    Only one data connection can be configured for a module type. For example, after a data connection is configured for Hive metadata, no other data connection can be configured for it. If no module type is available, the Configure Data Connection button is unavailable.

    Table 2 Configuring a Hive data connection

    Parameter

    Description

    Component

    Hive

    Module Type

    Hive metadata

    Data Connection Type

    • RDS PostgreSQL database (supported for clusters of MRS 1.9.x)
    • RDS MySQL database
    • Local database

    Instance

    This parameter is valid only when Data Connection Type is set to RDS PostgreSQL database or RDS MySQL database. Select the name of the connection between the MRS cluster and the RDS database. This instance must be created before being referenced here. You can click Create Data Connection to create a data connection. For details, see Creating an RDS Data Connection for an Existing MRS Cluster.

    Figure 5 Configuring a data connection

  6. Click Test to test connectivity of the data connection.
  7. After the data connection is successful, click OK.
  8. Upload the open-source driver package of PostgreSQL or MySQL to all nodes where MetaStore instances are deployed to replace the existing driver package of the cluster.

    • PostgreSQL: Upload the driver package postgresql-42.2.5.jar to the ${BIGDATA_HOME}/third_lib/Hive directory on all MetaStore instance nodes. Download the open-source driver package from https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.5/..
    • MySQL: Log in to the MySQL official website (https://www.mysql.com/), choose Downloads > Community > MySQL Connectors > Connector/J to download the driver package of the required version, and upload the driver package to the /opt/Bigdata/FusionInsight_HD_*/install/FusionInsight-Hive-*/hive-*/lib/ directory on all Metastore instance nodes.

  9. For MRS 3.3.0 and later versions, to store Hive metadata into the RDS PostgreSQL database, log in to all MetaStore instance nodes and run the following command to replace the SQL file content:

    sed -i 's#PRIMARY KEY ("MAPPING_ID"),#PRIMARY KEY (MAPPING_ID),#g' $BIGDATA_HOME/FusionInsight_Current/*_MetaStore/install/hive-3.1.0/scripts/metastore/upgrade/postgres/hive-schema-3.1.0.postgres.sql

    sed -i 's#UNIQUE ("CAT_NAME", "DB_NAME", "TBL_NAME")#UNIQUE (CAT_NAME, DB_NAME, TBL_NAME)#g' $BIGDATA_HOME/FusionInsight_Current/*_MetaStore/install/hive-3.1.0/scripts/metastore/upgrade/postgres/hive-schema-3.1.0.postgres.sql

  10. On the cluster details page, click the Components tab and click Hive. On the service details page, choose More > Restart Service and click OK to restart the Hive service.

    • If IAM users are not synchronized, click the Dashboard tab on the cluster details page, click Synchronize on the right of IAM User Sync, and then restart the Hive service.
    • Hive will create necessary database tables in the specified database after restart. (If tables already exist, they will not be created.)