Java API
Directly consult official website for detailed API of MapReduce: http://hadoop.apache.org/docs/r3.1.1/api/index.html
Common Interfaces
Common classes in MapReduce are as follows:
- org.apache.hadoop.mapreduce.Job: an interface for users to submit MR jobs and used to set job parameters, submit jobs, control job executions, and query job status.
- org.apache.hadoop.mapred.JobConf: configuration class of a MapReduce job and major configuration interface for users to submit jobs to Hadoop.
Interface |
Description |
---|---|
Job(Configuration conf, String jobName), Job(Configuration conf) |
Creates a MapReduce client for configuring job attributes and submitting the job. |
setMapperClass(Class<extends Mapper> cls) |
A core interface used to specify the Mapper class of a MapReduce job. The Mapper class is empty by default. You can also configure mapreduce.job.map.class in mapred-site.xml. |
setReducerClass(Class<extends Reducer> cls) |
A core interface used to specify the Reducer class of a MapReduce job. The Reducer class is empty by default. You can also configure mapreduce.job.reduce.class in mapred-site.xml. |
setCombinerClass(Class<extends Reducer> cls) |
Specifies the Combiner class of a MapReduce job. The Combiner class is empty by default. You can also configure mapreduce.job.combine.class in mapred-site.xml. The Combiner class can be used only when the input and output key and value types of the reduce task are the same. |
setInputFormatClass(Class<extends InputFormat> cls) |
A core interface used to specify the InputFormat class of a MapReduce job. The default InputFormat class is TextInputFormat. You can also configure mapreduce.job.inputformat.class in mapred-site.xml. This interface can be used to specify the InputFormat class for processing data in different formats, reading data, and splitting data into data blocks. |
setJarByClass(Class< > cls) |
A core interface used to specify the local location of the JAR package of a class. Java locates the JAR package based on the class file and uploads the JAR package to the Hadoop distributed file system (HDFS). |
setJar(String jar) |
Specifies the local location of the JAR package of a class. You can directly set the location of a JAR package and upload the JAR package to the HDFS. Use either setJar(String jar) or setJarByClass(Class< > cls). You can also configure mapreduce.job.jar in mapred-site.xml. |
setOutputFormatClass(Class<extends OutputFormat> theClass) |
A core interface used to specify the OutputFormat class of a MapReduce job. The default OutputFormat class is TextOutputFormat. You can also configure mapred.output.format.class in mapred-site.xml. In the default TextOutputFormat, each key and value are recorded in text. OutputFormat is not specified usually. |
setOutputKeyClass(Class< > theClass) |
A core interface used to specify the output key type of a MapReduce job. You can also configure mapreduce.job.output.key.class in mapred-site.xml. |
setOutputValueClass(Class< > theClass) |
A core interface used to specify the output value type of a MapReduce job. You can also configure mapreduce.job.output.value.class in mapred-site.xml. |
setPartitionerClass(Class<extends Partitioner> theClass) |
Specifies the Partitioner class of a MapReduce job. You can also configure mapred.partitioner.class in mapred-site.xml. This method is used to allocate Map output results to reduce classes. HashPartitioner is used by default, which evenly allocates the key-value pairs of a map task. For example, in HBase applications, different key-value pairs belong to different regions. In this case, you must specify the Partitioner class to allocate map output results. |
setSortComparatorClass(Class<extends RawComparator> cls) |
Specifies the compression class for output results of a map task. Compression is not implemented by default. You can also configure mapreduce.map.output.compress and mapreduce.map.output.compress.codec in mapred-site.xml. You can compress data for transmission when the map task outputs a large amount of data. |
setPriority(JobPriority priority) |
Specifies the priority of a MapReduce job. Five priorities can be set: VERY_HIGH, HIGH, NORMAL, LOW, and VERY_LOW. The default priority is NORMAL. You can also configure mapreduce.job.priority in mapred-site.xml. |
Interface |
Description |
---|---|
setNumMapTasks(int n) |
A core interface used to specify the number of map tasks in a MapReduce job. You can also configure mapreduce.job.maps in mapred-site.xml.
NOTE:
The InputFormat class controls the number of map tasks. Ensure that the InputFormat class supports setting the number of map tasks on the client. |
setNumReduceTasks(int n) |
A core interface used to specify the number of reduce tasks in a MapReduce job. Only one reduce task is started by default. You can also configure mapreduce.job.reduces in mapred-site.xml. The number of reduce tasks is controlled by users. In most cases, the number of reduce tasks is one-fourth the number of map tasks. |
setQueueName(String queueName) |
Specifies the queue where a MapReduce job is submitted. The default queue is used by default. You can also configure mapreduce.job.queuename in mapred-site.xml. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.