MapReduce Application Development Rules

Inherit the Mapper abstract class.

The map() and setup() methods are called during the Map procedure of a MapReduce task.

Example:

public static class MapperClass extends

Mapper<Object, Text, Text, IntWritable> {
/**
* map input. The key indicates the offset of the original file, and the value is a row of characters in the original file.
* The map input key and value are provided by InputFormat. You do not need to set them. By default, TextInputFormat is used.
*/
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
//Custom implementation
}
/**
* The setup() method is called only once before the map() method of a map task or the reduce() method of a reduce task is called.*/
public void setup(Context context) throws IOException,
InterruptedException {
// Custom implementation
}
}

Inherit the Reducer abstract class.

The reduce() and setup() methods are called during the Reduce procedure of a MapReduce task.

Example:

public static class ReducerClass extends

Reducer<Text, IntWritable, Text, IntWritable> {

/**
* @param The input is a collection iterator consisting of (key, value) pairs. 
* Each map puts together all the pairs with the same key. The reduce method sums the number of the same keys.
* Call context.write(key, value) to write the output to the specified directory. 
* Outputformat writes the (key, value) pairs output by reduce to the file system.
* By default, TextOutputFormat is used to write the reduce output to the HDFS.
*/

public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
// Custom implementation
}

/**
* The setup() method is called only once before the map() method of a map task or the reduce() method of a reduce task is called.
*/

public void setup(Context context) throws IOException,
InterruptedException {

// Custom implementation. Context obtains the configuration information. 

}
}

Submit a MapReduce task.

Use the main() method to create a job, set parameters, and submit the job to the Hadoop cluster.

Example:

public static void main(String[] args) throws Exception {
Configuration conf = getConfiguration();
// Input parameters for the main method: args[0] indicates the input path of the MR job. args[1] indicates the output path of the MR job. 
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "job name");
// Locate the jar package of the major task. 
job.setJar("D:\\job-examples.jar");
// job.setJarByClass(TestWordCount.class);
// Set the map and reduce classes to be executed. You can also specify them in the configuration file.
job.setMapperClass(TokenizerMapperV1.class);
job.setReducerClass(IntSumReducerV1.class);
// Set the combiner class. By default, it is not used. If it is used, it runs the same classes as reduce. Exercise care when using the Combiner class. You can also specify the combiner class in the configuration file. 
job.setCombinerClass(IntSumReducerV1.class);
// Set the output type of the job. You can also specify it in the configuration file. 
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// Set the input and output paths for the job. You can also specify them in the configuration file.
Path outputPath = new Path(otherArgs[1]);
FileSystem fs = outputPath.getFileSystem(conf);
// If the output path already exists, delete it.
if (fs.exists(outputPath)) {
fs.delete(outputPath, true);
}
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}

Parent topic: Mapreduce

Previous topic: Mapreduce

Next topic: MapReduce Application Development Suggestions