Common Concepts of MapReduce Application Development
- Hadoop shell commands
Basic Hadoop shell commands include commands that are used to submit MapReduce jobs, kill MapReduce jobs, and perform operations on the HDFS.
- MapReduce InputFormat and OutputFormat
Based on the specified InputFormat, the MapReduce framework splits data sets, reads data, provides key-value pairs for Map tasks, and determines the number of Map tasks that are started in parallel mode. Based on the OutputFormat, the MapReduce framework outputs the generated key-value pairs to data in a specific format.
Map and Reduce tasks are running based on key-value pairs. In other words, the framework regards the input information of a job as a group of key-value pairs and outputs a group of key-value pairs. Two groups of key-value pairs may be of different types. For a single Map or Reduce task, key-value pairs are processed in single-thread serial mode.
The framework needs to perform serialized operations on key and value classes. Therefore, the classes must support the Writable API. In addition, to facilitate sorting operations, key classes must support the WritableComparable API.
The input and output types of a MapReduce job are as follows:
(input)<k1,v1> —> map —> <k2,v2> —> Summary data —> <k2,List(v2)> —> reduce —> <k3,v3>(output)
- Core of Jobs
Typically, an application only needs to inherit Mapper and Reducer classes and rewrite map and reduce methods to implement service logic. The map and reduce methods constitute the core of jobs.
- MapReduce Web UI
MapReduce web UIs allow users to monitor running or historical MapReduce jobs, view logs, and implement fine-grained job development, configuration, and optimization.
- Archiving
Archiving ensures that all mapped key-value pairs share one key group.
- Shuffle
Shuffle is a process of outputting data from a Map task to a Reduce task.
- Mapping
Mapping is used to map a group of key-value pairs into a new group of key-value pairs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot