What Is MRS?

Big data presents both exciting opportunities and a huge challenge. As the data volume and types increase rapidly, conventional data processing technologies, such as standalone storage systems and relational databases, are struggling to keep up. Rising to this challenge, the Apache Software Foundation (ASF) launched an open source project called Hadoop. Hadoop is an open source distributed computing platform that can fully utilize the computing and storage capabilities of large compute clusters to process massive amounts of data. Hadoop is a powerful framework, but it is not easy to deploy and operationalize — If enterprises try to deploy Hadoop systems all by themselves, they may encounter problems such as high costs, long rollout, difficult maintenance, and inflexible use.

The MapReduce Service (MRS) offers a one-stop service that helps you quickly deploy and manage Hadoop systems on Huawei Cloud with ease. With MRS, you can create an enterprise-class Hadoop cluster with just a few clicks of your mouse. Tenants have total control over their Hadoop clusters and can effortlessly run big data components such as Storm, Hadoop, Spark, HBase, and Kafka. MRS supports a full range of open source APIs. By leveraging Huawei Cloud's deep expertise in compute, storage, and big data, it offers customers a full-stack big data platform featuring high performance, high cost-effectiveness, flexibility, and ease-of-use. Furthermore, the platform can be easily customized to meet new requirements and help enterprises quickly build a massive data processing system and discover new value and business opportunities by analyzing and mining massive amounts of data in real time or in non-real time.

Video Tutorial

Product Architecture

List of MRS Component Versions lists the MRS component versions.

Figure 1 shows the MRS logical architecture.

Figure 1 MRS architecture
Click to enlarge

MRS includes the infrastructure and an end-to-end big data processing pipeline.

Infrastructure
MRS big data clusters fully utilize the high scalability, reliability, and security features of the virtualization layer powered by the cloud platform.
- Virtual Private Cloud (VPC) provides virtual private networks for each tenant on the cloud.
- Elastic Volume Service (EVS) provides reliable and high-performance storage.
- Elastic Cloud Server (ECS) provides VMs that are easily scalable. It works with VPCs, security groups, and the EVS multi-replica mechanism to build an efficient, reliable, and secure computing environment.
Data collection
The data collection layer provides the ability to efficiently ingest data from various data sources. It consists of Flume (data ingestion), Loader (relational data loading), and Kafka (highly reliable message queue). Alternatively, you can use Cloud Data Migration (CDM) service to ingest external data to MRS clusters.
Data storage
MRS clusters can store both structured and unstructured data. They support multiple efficient data formats to meet the requirements of different computing engines, including:
- HDFS, which is a general-purpose distributed file system for big data platforms.
- Huawei Cloud OBS is an object storage service that features high availability and low cost.
Converged data processing
- MRS supports multiple mainstream compute engines, including MapReduce (batch processing), Tez (DAG model), Spark (in-memory computing), Spark Streaming (micro-batch stream computing), Storm (stream computing), and Flink (stream computing). They convert data structures and logic into data models that meet the needs of a variety of big data applications.
- Based on preset data models and easy-to-use SQL data analysis, users can choose Hive (data warehouse), SparkSQL, and Presto (interactive query engine) to run different types of analytical tasks.
Data display and scheduling
Data analysis results are displayed intuitively. MRS also integrates with DataArts Studio to provide a one-stop, collaborative big data development platform, helping you easily run a range of different tasks, such as data modeling, data integration, script development, job scheduling, and O&M monitoring, making big data more accessible than ever before.
Cluster management
All components of the Hadoop-based big data ecosystem are deployed in distributed mode, and their deployment, management, and O&M are complex.

MRS provides a unified O&M and management platform for cluster management, supporting one-click cluster deployment, multi-version selection, as well as manual scaling and auto scaling of clusters with zero service interruption. In addition, MRS provides job management, resource tag management, and O&M covering all of the Hadoop components. One-stop O&M capabilities include monitoring, alarm reporting, parameter configuration, and patch upgrade.

Product Advantages

MRS has a strong Hadoop kernel team and is built on top of Huawei's enterprise-class FusionInsight big data platform. MRS can guarantee multi-level Service Level Agreements (SLAs).

MRS has the following advantages:

High performance
MRS supports Huawei's own CarbonData storage solution. CarbonData allows a single copy of data to be used for multiple tasks. It supports features such as multi-level indexing, dictionary encoding, pre-aggregation, dynamic partitioning, and quasi-real-time data query. These features improve I/O scanning and computing performance, allowing tens of billions of data records to be analyzed in seconds. In addition, MRS supports the Superior Scheduler also developed by Huawei, which outperforms open-source schedulers in every way and enables efficient scheduling in super large clusters (up to 10,000 nodes).
Cost-effectiveness
MRS supports a heterogeneous compute and storage infrastructure with decoupled storage and compute, offering a cost-effective mass storage solution. MRS supports fast auto scaling to accommodate changing demand, maximizing resource utilization for customers. MRS clusters can be quickly created and scaled out as you needed, and can be deleted or scaled when you no longer need them.
High security
MRS provides enterprise-class multi-tenant permissions management and security management, with support for table-based and column-based access control and data encryption.
Easy O&M
MRS provides an efficient big data cluster management platform that supports one-click rolling patch updates, which ensure the continuity of your services.
High reliability
Tested and proven in numerous projects, the long-term reliability and stability of MRS in large-scale deployments can meet enterprise-class standards for production systems. In addition, MRS supports automatic data backup across AZs and regions, as well as automatic anti-affinity, allowing mission-critical VMs to be distributed on different physical machines.

Using MRS for the First Time

If you are a first-time user, you may get started with the following:

Basic concepts
See List of MRS Component Versions and Functions for the basics about MRS, including all its components and their enhancements over their open-source counterparts, as well as the unique features of MRS.
Getting started
To learn how to use MRS, see Creating and Using a Hadoop Cluster for Offline Analysis. "Getting Started" provides detailed operation guides with real-world examples. You can create and use MRS clusters by following these guides.
Other functions and operation guides
If you are an MRS cluster user or O&M engineer, you can perform operations such as cluster life cycle management, scaling, and job management by referring to Buying MRS Clusters. To learn how to use each component, see MapReduce Service Component Operation Guide.

If you are a developer, you can refer to the operation guides and examples in Introduction to MRS Application Development to develop, run, and commission your own applications. For details about how to call the APIs of MRS, see Before You Start.