Updated on 2022-07-11 GMT+08:00

Java API

Directly consult official website for detailed API of MapReduce: http://hadoop.apache.org/docs/r3.1.1/api/index.html

Common Interfaces

Common classes in MapReduce are as follows:

  • org.apache.hadoop.mapreduce.Job: an interface for users to submit MR jobs and used to set job parameters, submit jobs, control job executions, and query job status.
  • org.apache.hadoop.mapred.JobConf: configuration class of a MapReduce job and major configuration interface for users to submit jobs to Hadoop.
Table 1 Common interfaces of org.apache.hadoop.mapreduce.Job

Interface

Description

Job(Configuration conf, String jobName), Job(Configuration conf)

Creates a MapReduce client for configuring job attributes and submitting the job.

setMapperClass(Class<extends Mapper> cls)

A core interface used to specify the Mapper class of a MapReduce job. The Mapper class is empty by default. You can also configure mapreduce.job.map.class in mapred-site.xml.

setReducerClass(Class<extends Reducer> cls)

A core interface used to specify the Reducer class of a MapReduce job. The Reducer class is empty by default. You can also configure mapreduce.job.reduce.class in mapred-site.xml.

setCombinerClass(Class<extends Reducer> cls)

Specifies the Combiner class of a MapReduce job. The Combiner class is empty by default. You can also configure mapreduce.job.combine.class in mapred-site.xml. The Combiner class can be used only when the input and output key and value types of the reduce task are the same.

setInputFormatClass(Class<extends InputFormat> cls)

A core interface used to specify the InputFormat class of a MapReduce job. The default InputFormat class is TextInputFormat. You can also configure mapreduce.job.inputformat.class in mapred-site.xml. This interface can be used to specify the InputFormat class for processing data in different formats, reading data, and splitting data into data blocks.

setJarByClass(Class< > cls)

A core interface used to specify the local location of the JAR package of a class. Java locates the JAR package based on the class file and uploads the JAR package to the Hadoop distributed file system (HDFS).

setJar(String jar)

Specifies the local location of the JAR package of a class. You can directly set the location of a JAR package and upload the JAR package to the HDFS. Use either setJar(String jar) or setJarByClass(Class< > cls). You can also configure mapreduce.job.jar in mapred-site.xml.

setOutputFormatClass(Class<extends OutputFormat> theClass)

A core interface used to specify the OutputFormat class of a MapReduce job. The default OutputFormat class is TextOutputFormat. You can also configure mapred.output.format.class in mapred-site.xml. In the default TextOutputFormat, each key and value are recorded in text. OutputFormat is not specified usually.

setOutputKeyClass(Class< > theClass)

A core interface used to specify the output key type of a MapReduce job. You can also configure mapreduce.job.output.key.class in mapred-site.xml.

setOutputValueClass(Class< > theClass)

A core interface used to specify the output value type of a MapReduce job. You can also configure mapreduce.job.output.value.class in mapred-site.xml.

setPartitionerClass(Class<extends Partitioner> theClass)

Specifies the Partitioner class of a MapReduce job. You can also configure mapred.partitioner.class in mapred-site.xml. This method is used to allocate Map output results to reduce classes. HashPartitioner is used by default, which evenly allocates the key-value pairs of a map task. For example, in HBase applications, different key-value pairs belong to different regions. In this case, you must specify the Partitioner class to allocate map output results.

setSortComparatorClass(Class<extends RawComparator> cls)

Specifies the compression class for output results of a map task. Compression is not implemented by default. You can also configure mapreduce.map.output.compress and mapreduce.map.output.compress.codec in mapred-site.xml. You can compress data for transmission when the map task outputs a large amount of data.

setPriority(JobPriority priority)

Specifies the priority of a MapReduce job. Five priorities can be set: VERY_HIGH, HIGH, NORMAL, LOW, and VERY_LOW. The default priority is NORMAL. You can also configure mapreduce.job.priority in mapred-site.xml.

Table 2 Common interfaces of org.apache.hadoop.mapred.JobConf

Interface

Description

setNumMapTasks(int n)

A core interface used to specify the number of map tasks in a MapReduce job. You can also configure mapreduce.job.maps in mapred-site.xml.

NOTE:

The InputFormat class controls the number of map tasks. Ensure that the InputFormat class supports setting the number of map tasks on the client.

setNumReduceTasks(int n)

A core interface used to specify the number of reduce tasks in a MapReduce job. Only one reduce task is started by default. You can also configure mapreduce.job.reduces in mapred-site.xml. The number of reduce tasks is controlled by users. In most cases, the number of reduce tasks is one-fourth the number of map tasks.

setQueueName(String queueName)

Specifies the queue where a MapReduce job is submitted. The default queue is used by default. You can also configure mapreduce.job.queuename in mapred-site.xml.