Updated on 2022-07-11 GMT+08:00

Overview

Intended Audience

This document is intended for development personnel who are experienced in Java development and want to develop YARN applications.

YARN Introduction

YARN is a distributed resource management system that is used to improve resource usage in the distributed cluster environment. Resources include memory, I/O, network, and disk resources. YARN is developed to address the shortage of the original MapReduce framework. At the beginning, MapReduce committers periodically modified existing codes. As codes increase and because the original MapReduce framework was designed improperly, modification on the original MapReduce framework becomes more difficult. Therefore, MapReduce committers decided to re-design the MapReduce framework to provide a next-generation MapReduce (MRv2/Yarn) framework that supports high scalability, availability, reliability, backward compatibility, and resource usage. The next-generation MapReduce (MRv2/Yarn) framework supports more computing frameworks in addition to the MapReduce framework.

Basic Concepts

  • ResourceManager (RM)

    ResourceManager is a global resource manager that manages and allocates resources in the system. ResourceManager consists of two components: Scheduler and Applications Manager.

  • ApplicationMaster (AM)

    Each application submitted by users includes an ApplicationMaster. The ApplicationMaster provides the following functions:

    • Negotiates with the ResourceManager Scheduler to obtain resources (represented by Containers).
    • Allocates resource to internal tasks.
    • Communicates with NodeManager to start or stop tasks.
    • Monitors all tasks with the running status, and applies for resources again for tasks when tasks fail to run to restart the tasks.
  • NodeManager (NM)

    NodeManager is the resource and task manager of each node. On one hand, NodeManager periodically reports resource usage of the local node and the running status of each Container to ResourceManager. On the other hand, NodeManager receives and processes requests from ApplicationMaster for starting or stopping Containers.

  • Container

    Container is a resource abstract in YARN. Container encapsulates multidimensional resources of a node, such as memory, CPU, disk, and network resources. When ApplicationMaster applies for resources from ResourceManager, ResourceManager returns resources in a Container to ApplicationMaster.