Updated on 2022-09-14 GMT+08:00

Basic Concepts

  • Hadoop shell commands

    Basic Hadoop shell commands include commands that are used to submit MapReduce jobs, kill MapReduce jobs, and perform operations on the HDFS.

  • MapReduce InputFormat and OutputFormat

    Based on the specified InputFormat, the MapReduce framework splits data sets, reads data, provides key-value pairs for Map tasks, and determines the number of Map tasks that are started in parallel mode. Based on the OutputFormat, the MapReduce framework outputs the generated key-value pairs to data in a specific format.

    Map and Reduce tasks are running based on key-value pairs. In other words, the framework regards the input information of a job as a group of key-value pairs and outputs a group of key-value pairs. Two groups of key-value pairs may be of different types. For a single Map or Reduce task, key-value pairs are processed in single-thread serial mode.

    The framework needs to perform serialized operations on key and value classes. Therefore, the classes must support the Writable API. In addition, to facilitate sorting operations, key classes must support the WritableComparable API.

    The input and output types of a MapReduce job are as follows:

    (input)<k1,v1> —> map —> <k2,v2> —> Summary data —> <k2,List(v2)> —> reduce —> <k3,v3>(output)

  • Core of Jobs

    Typically, an application only needs to inherit Mapper and Reducer classes and rewrite map and reduce methods to implement service logic. The map and reduce methods constitute the core of jobs.

  • MapReduce Web UI

    MapReduce web UIs allow users to monitor running or historical MapReduce jobs, view logs, and implement fine-grained job development, configuration, and optimization.

  • Archiving

    Archiving ensures that all mapped key-value pairs share one key group.

  • Shuffle

    Shuffle is a process of outputting data from a Map task to a Reduce task.

  • Mapping

    Mapping is used to map a group of key-value pairs into a new group of key-value pairs.