Updated on 2024-10-21 GMT+08:00

MRS Data Source Usage Overview

MRS Cluster Overview

MRS is a big data cluster running based on the open-source Hadoop ecosystem. It provides the industry's latest cutting-edge storage and analysis capabilities of massive volumes of data, satisfying your data storage and processing requirements. For details about MRS, see the MapReduce Service User Guide.

You can use Hive/Spark (analysis cluster of MRS) to store massive volumes of service data. Hive/Spark data files are stored in HDFS. On GaussDB(DWS), you can connect a data warehouse cluster to MRS clusters, read data from HDFS files, and write the data to GaussDB(DWS) when the clusters are on the same network.

Currently, the hybrid data warehouse (standalone mode) cannot import data from MRS.

Operation Process

Perform the following operations to import data from MRS to a data warehouse cluster:

  1. Prerequisites
    1. Create an MRS cluster. For details, see Buying a Custom Cluster.
    2. Create an HDFS foreign table for querying data from the MRS cluster over APIs of a foreign server.

      For details, see Importing Data from MRS to a Data Warehouse Cluster in Data Warehouse Service (DWS) Data Migration and Synchronization.

      • Multiple MRS data sources can exist on the same network, but one GaussDB(DWS) cluster can connect to only one MRS cluster at a time.
  2. In the data warehouse cluster, create an MRS data source connection according to Creating an MRS Data Source Connection.
  3. Import data from an MRS data source to the cluster. For details, see Importing Data from MRS to a Cluster.
  4. (Optional) When the HDFS configuration of the MRS cluster changes, update the MRS data source configuration on GaussDB(DWS). For details, see Updating the MRS Data Source Configuration.