Help Center/ MapReduce Service/ User Guide/ Managing Clusters/ Managing MRS Cluster Metadata/ MRS Cluster Metadata Storage in an External Data Source
Updated on 2025-08-09 GMT+08:00

MRS Cluster Metadata Storage in an External Data Source

Metadata is data that describes other data, providing details such as its structure, storage location, and access permissions. In an MRS cluster, component metadata is stored by default within the local GaussDB database of the cluster. Deleting a cluster will also delete its metadata. To retain the metadata, you need to manually save it in advance.

MRS provides the data connection management function. This function allows metadata of components (such as Hive and Ranger) to be stored in external data sources, decoupling the data storage layer (such as HDFS) from compute engines (such as Spark and Flink).

For example, Hive metadata can be stored in an external relational database and will not be deleted when the current MRS cluster is deleted. In addition, multiple MRS clusters can share the same metadata.

Figure 1 MRS cluster metadata storage in an external data source

External Data Connections Supported by MRS

Table 1 MRS external data connections

Data Connection Type

Description

Applicable Version

Supported Engine

RDS PostgreSQL database

RDS for PostgreSQL is designed for enterprise online transactional processing (OLTP) scenarios requiring complex SQL processing. It supports NoSQL data types (such as JSON, XML, and HStore) and geographic information system (GIS) data types, and is renowned for its reliability and data integrity. It is suitable for internet websites, location-based applications, and complex data object processing.

For more information, see What Is RDS for PostgreSQL?

  • MRS cluster version: clusters with the Hive component installed
  • PostgreSQL version: PostgreSQL 14

Hive

RDS MySQL database

RDS for MySQL is fully compatible with native MySQL, combining stability, reliability, and high performance. It features intelligent operations and maintenance, robust security, out-of-the-box usability, and automatic scaling.

For more information, see What Is RDS for MySQL?

  • MRS cluster version: clusters with the Hive or Ranger component installed
  • MySQL versions: MySQL 5.7.x and MySQL 8.0
  • Hive
  • Ranger

GaussDB(for MySQL)

GaussDB is a distributed relational database developed by Huawei. It supports distributed transactions and intra-city deployment across AZs for zero data loss, storage for petabytes of data, and scale-up to more than 1,000 nodes.

For more information, see What Is GaussDB?

MRS cluster versions: MRS 3.1.2-LTS.3, MRS 3.1.5, and MRS 3.3.0-LTS

  • Hive
  • Ranger

LakeFormation

LakeFormation is a one-stop enterprise-class data lake and warehouse construction service. It provides APIs and a GUI for unified management of data lake metadata, and is compatible with Hive metadata and Ranger permission models. LakeFormation can connect to multiple compute engines and big data cloud services seamlessly to ensure quick building and easy operations of data lakes and unleash rich value of service data.

LakeFormation is a serverless service that uses underlying resources to implement cross-AZ deployment, high reliability, auto scaling, unified metadata management, association between metadata and file directories, and interconnection with multiple compute engines.

For more information, see What Is LakeFormation?

MRS cluster version: MRS 3.3.0-LTS or later

  • Hive
  • Ranger

Notes and Constraints

  • When Hive metadata is switched between different clusters, MRS synchronizes only the permissions in the metadata database of the Hive component. The permission model on MRS is maintained on MRS Manager. Therefore, when Hive metadata is switched between clusters, the permissions of users or user groups cannot be automatically synchronized to MRS Manager of another cluster.
  • The VPC and subnet of the service for which an external data connection will be created must be the same as those of the MRS cluster to be interconnected.
  • The RDS database instance interconnected with the MRS cluster cannot be deleted. Otherwise, the cluster will be abnormal.