What Is MRS?
Big data is a huge challenge facing the Internet era as the data volume and types increase rapidly. Conventional data processing technologies, such as single-node storage and relational databases, are unable to solve the emerging big data problems. In this case, the Apache Software Foundation (ASF) has launched an open source Hadoop big data processing solution. Hadoop is an open source distributed computing platform that can fully utilize computing and storage capabilities of clusters to process massive amounts of data. If enterprises deploy Hadoop systems by themselves, the disadvantages include high costs, long deployment period, difficult maintenance, and inflexible use.
To solve the preceding problems, HUAWEI CLOUD provides MapReduce Service (MRS) for managing the Hadoop system. With MRS, you can deploy a Hadoop cluster in just one click. MRS provides enterprise-level big data clusters on the cloud. Tenants can fully control clusters and easily run big data components such as Hadoop, Spark, HBase, Kafka, and Storm. MRS is fully compatible with open source APIs, and incorporates advantages of HUAWEI CLOUD computing and storage and big data industry experience to provide customers with a full-stack big data platform featuring high performance, low cost, flexibility, and ease-of-use. In addition, the platform can be customized based on service requirements to help enterprises quickly build a massive data processing system and discover new value points and business opportunities by analyzing and mining massive amounts of data in real time or in non-real time.
Figure 1 shows the logical architecture of HUAWEI CLOUD MRS.
MRS architecture includes infrastructure and big data processing phases.
- InfrastructureMRS big data clusters are built based on HUAWEI CLOUD Elastic Cloud Server (ECS), and make full use of the high reliability and security capabilities of the virtualization layer.
- A Virtual Private Cloud (VPC) is a virtual internal network provided for each tenant. It is isolated from other networks by default.
- Elastic Volume Service (EVS) provides highly reliable and high-performance storage.
- ECS provides scalable VMs, and works with VPCs, security groups, and the EVS multi-replica mechanism to build an efficient, reliable, and secure computing environment.
- Data integration
The data integration layer provides data access capabilities of MRS clusters, including components Flume (data ingestion), Loader (relational data import), and Kafka (highly reliable message queue). Data can be imported to MRS clusters from various data sources.
- Data storage
MRS clusters can store structured and unstructured data, and support multiple efficient formats to meet the requirements of different computing engines.
- HDFS is a general-purpose distributed file system on a big data platform.
- OBS is an object storage service that features high availability and low cost.
- HBase supports data storage with indexes, and is applicable to high-performance index-based query scenarios.
- Data computing
MRS provides multiple mainstream computing engines, including MapReduce (batch processing), Tez (DAG model), Spark (in-memory computing), SparkStreaming (micro-batch stream computing), Storm (stream computing), and Flink (stream computing), to meet the requirements of various big data application scenarios. The engines convert data structures and logic into data models that meet service requirements.
- Data analysis
Based on the preset data model and easy-to-use SQL data analysis, users can select Hive (data warehouse), SparkSQL, and Presto (interactive query engine).
- Data display and scheduling
To present data analysis results, MRS is integrated with Data Lake Factory (DLF), which is a one-stop big data collaboration development platform, to help you easily complete multiple tasks, such as data modeling, data integration, script development, job scheduling, and job monitoring. This makes big data more accessible than ever before, helping you quickly build big data processing centers.
- Cluster management
All components of the Hadoop-based big data ecosystem are deployed in distributed mode, and their deployment, management, and O&M are complex.
MRS provides a unified O&M management platform for cluster management, supporting one-click cluster deployment, multi-version selection, as well as manual scaling and auto scaling of clusters without service interruption. In addition, MRS provides job management, resource tag management, and O&M of the preceding data processing components at each layer. It also provides one-stop O&M capabilities, covering monitoring, alarm reporting, configuration, and patch upgrade.
MRS has a powerful Hadoop kernel team and is deployed based on Huawei's enterprise-level FusionInsight big data platform. MRS has been deployed on tens of thousands of nodes and can ensure Service Level Agreements (SLAs) for multi-level users.
MRS has the following advantages:
- High performance
MRS supports self-developed CarbonData storage technology. CarbonData is a high-performance big data storage solution. It allows one data set to apply to multiple scenarios and supports features, such as multi-level indexing, dictionary encoding, pre-aggregation, dynamic partitioning, and quasi-real-time data query. This improves I/O scanning and computing performance and returns analysis results of tens of billions of data records in seconds. In addition, MRS supports self-developed enhanced scheduler Superior, which breaks the scale bottleneck of a single cluster and is capable of scheduling over 10,000 nodes in a cluster.
- Low cost
Based on diversified cloud infrastructure, MRS provides various computing and storage choices and separates computing from storage, delivering low-cost massive data storage solutions. MRS supports auto scaling to address peak and off-peak service loads, releasing idle resources on the big data platform for customers. MRS clusters can be created and scaled out when you need them, and can be terminated or scaled in after you use them, minimizing cost.
- High security
With Kerberos authentication, MRS provides role-based access control (RBAC) and sound audit functions. MRS is a one-stop big data platform that allows different physical isolation modes to be set up for customers in the public resource area and dedicated resource area of HUAWEI CLOUD as well as HCS Online in the customer's equipment room. A cluster supports multiple logical tenants. Permission isolation enables the computing, storage, and table resources of the cluster to be divided based on tenants.
- Easy O&M
MRS provides a visualized big data cluster management platform, improving O&M efficiency. MRS supports rolling patch upgrade and provides visualized patch release information and one-click patch installation without manual intervention, ensuring long-term stability of user clusters.
- High reliability
MRS delivers high availability (HA) and real-time SMS and email notification on all nodes.
MRS Learning Paths
You can quickly understand MRS and learn how to use MRS by referring to Progressive Knowledge.