Java API
Common MapReduce APIs
Common classes in MapReduce are as follows:
- org.apache.hadoop.mapreduce.Job: API for users to submit MapReduce jobs. It is used to set job parameters, submit jobs, control job execution, and query job status.
- org.apache.hadoop.mapred.JobConf: configuration class of MapReduce jobs and a major configuration API for users to submit jobs to Hadoop.
Function |
Description |
---|---|
Job(Configuration conf, String jobName), Job(Configuration conf) |
Creates a MapReduce client for configuring job attributes and submitting a job. |
setMapperClass(Class<extends Mapper> cls) |
A core API used to specify the Mapper class of a MapReduce job. The Mapper class is empty by default. You can also configure mapreduce.job.map.class in mapred-site.xml. |
setReducerClass(Class<extends Reducer> cls) |
A core API used to specify the Reducer class of a MapReduce job. The Reducer class is empty by default. You can also configure mapreduce.job.reduce.class in mapred-site.xml. |
setCombinerClass(Class<extends Reducer> cls) |
Specifies the Combiner class of a MapReduce job. The Combiner class is empty by default. You can also configure mapreduce.job.combine.class in mapred-site.xml. The Combiner class can be used only when the input and output key and value types of the reduce task are the same. |
setInputFormatClass(Class<extends InputFormat> cls) |
A core API used to specify the InputFormat class of a MapReduce job. The default InputFormat class is TextInputFormat. You can also configure mapreduce.job.inputformat.class in mapred-site.xml. This API can be used to specify the InputFormat class for processing data in different formats, reading data, and splitting data into data blocks. |
setJarByClass(Class< > cls) |
A core API used to specify the local location of the JAR file of a class. Java uses the class file to find the JAR file, which is uploaded to HDFS. |
setJar(String jar) |
Specifies the local location of the JAR file of a class. You can directly set the location of a JAR file, which is uploaded to HDFS. Use either setJar(String jar) or setJarByClass(Class< > cls). You can also configure mapreduce.job.jar in mapred-site.xml. |
setOutputFormatClass(Class<extends OutputFormat> theClass) |
A core API used to specify the OutputFormat class of a MapReduce job. The default OutputFormat class is TextOutputFormat. You can also configure mapred.output.format.class in mapred-site.xml, and specify the data format for the output. In the default TextOutputFormat, each key and value are recorded in text. OutputFormat is not specified usually. |
setOutputKeyClass(Class< > theClass) |
A core API used to specify the output key type of a MapReduce job. You can also configure mapreduce.job.output.key.class in mapred-site.xml. |
setOutputValueClass(Class< > theClass) |
A core API used to specify the output value type of a MapReduce job. You can also configure mapreduce.job.output.value.class in mapred-site.xml. |
setPartitionerClass(Class<extends Partitioner> theClass) |
Specifies the Partitioner class of a MapReduce job. You can also configure mapred.partitioner.class in mapred-site.xml. This method is used to allocate Map output results to a Reduce class. HashPartitioner is used by default, and evenly allocates the key-value pairs of a Map task. For example, in HBase applications, different key-value pairs belong to different regions. In this case, you must specify the Partitioner class to allocate Map output results. |
setSortComparatorClass(Class<extends RawComparator> cls) |
Specifies the compression class for output results of a Map task. Compression is not implemented by default. You can also configure mapreduce.map.output.compress and mapreduce.map.output.compress.codec in mapred-site.xml. You can compress intermediate data for transmission to lighten network pressure when the Map task outputs a large amount of data. |
setPriority(JobPriority priority) |
Specifies the priority of a MapReduce job. Five priorities can be set: VERY_HIGH, HIGH, NORMAL, LOW, and VERY_LOW. The default priority is NORMAL. You can also configure mapreduce.job.priority in mapred-site.xml. |
Method |
Description |
---|---|
setNumMapTasks(int n) |
A core API used to specify the number of Map tasks in a MapReduce job. You can also configure mapreduce.job.maps in mapred-site.xml.
NOTE:
The InputFormat class controls the number of Map tasks. Ensure that the InputFormat class allows the number of Map tasks to be set on the client. |
setNumReduceTasks(int n) |
A core API used to specify the number of Reduce tasks in a MapReduce job. Only one Reduce task is started by default. You can also configure mapreduce.job.reduces in mapred-site.xml. The number of Reduce tasks is controlled by users. In most cases, the number of Reduce tasks is one-fourth the number of Map tasks. |
setQueueName(String queueName) |
Specifies the queue where a MapReduce job is submitted. The default queue is used by default. You can also configure mapreduce.job.queuename in mapred-site.xml. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.