MapReduce Java APIs
Common MapReduce APIs
Common classes in MapReduce are as follows:
- org.apache.hadoop.mapreduce.Job: API for users to submit MapReduce jobs. It is used to set job parameters, submit jobs, control job execution, and query job status.
- org.apache.hadoop.mapred.JobConf: configuration class of MapReduce jobs and a major configuration API for users to submit jobs to Hadoop.
Function |
Description |
---|---|
Job(Configuration conf, String jobName), Job(Configuration conf) |
Creates a MapReduce client for configuring job attributes and submitting a job. |
setMapperClass(Class<extends Mapper> cls) |
A core API used to specify the Mapper class of a MapReduce job. The Mapper class is empty by default. You can also configure mapreduce.job.map.class in mapred-site.xml. |
setReducerClass(Class<extends Reducer> cls) |
A core API used to specify the Reducer class of a MapReduce job. The Reducer class is empty by default. You can also configure mapreduce.job.reduce.class in mapred-site.xml. |
setCombinerClass(Class<extends Reducer> cls) |
Specifies the Combiner class of a MapReduce job. The Combiner class is empty by default. You can also configure mapreduce.job.combine.class in mapred-site.xml. The Combiner class can be used only when the input and output key and value types of the reduce task are the same. |
setInputFormatClass(Class<extends InputFormat> cls) |
A core API used to specify the InputFormat class of a MapReduce job. The default InputFormat class is TextInputFormat. You can also configure mapreduce.job.inputformat.class in mapred-site.xml. This API can be used to specify the InputFormat class for processing data in different formats, reading data, and splitting data into data blocks. |
setJarByClass(Class< > cls) |
A core API used to specify the local location of the JAR file of a class. Java uses the class file to find the JAR file, which is uploaded to HDFS. |
setJar(String jar) |
Specifies the local location of the JAR file of a class. You can directly set the location of a JAR file, which is uploaded to HDFS. Use either setJar(String jar) or setJarByClass(Class< > cls). You can also configure mapreduce.job.jar in mapred-site.xml. |
setOutputFormatClass(Class<extends OutputFormat> theClass) |
A core API used to specify the OutputFormat class of a MapReduce job. The default OutputFormat class is TextOutputFormat. You can also configure mapred.output.format.class in mapred-site.xml, and specify the data format for the output. In the default TextOutputFormat, each key and value are recorded in text. OutputFormat is not specified usually. |
setOutputKeyClass(Class< > theClass) |
A core API used to specify the output key type of a MapReduce job. You can also configure mapreduce.job.output.key.class in mapred-site.xml. |
setOutputValueClass(Class< > theClass) |
A core API used to specify the output value type of a MapReduce job. You can also configure mapreduce.job.output.value.class in mapred-site.xml. |
setPartitionerClass(Class<extends Partitioner> theClass) |
Specifies the Partitioner class of a MapReduce job. You can also configure mapred.partitioner.class in mapred-site.xml. This method is used to allocate Map output results to a Reduce class. HashPartitioner is used by default, and evenly allocates the key-value pairs of a Map task. For example, in HBase applications, different key-value pairs belong to different regions. In this case, you must specify the Partitioner class to allocate Map output results. |
setSortComparatorClass(Class<extends RawComparator> cls) |
Specifies the compression class for output results of a Map task. Compression is not implemented by default. You can also configure mapreduce.map.output.compress and mapreduce.map.output.compress.codec in mapred-site.xml. You can compress intermediate data for transmission to lighten network pressure when the Map task outputs a large amount of data. |
setPriority(JobPriority priority) |
Specifies the priority of a MapReduce job. Five priorities can be set: VERY_HIGH, HIGH, NORMAL, LOW, and VERY_LOW. The default priority is NORMAL. You can also configure mapreduce.job.priority in mapred-site.xml. |
Method |
Description |
---|---|
setNumMapTasks(int n) |
A core API used to specify the number of Map tasks in a MapReduce job. You can also configure mapreduce.job.maps in mapred-site.xml.
NOTE:
The InputFormat class controls the number of Map tasks. Ensure that the InputFormat class allows the number of Map tasks to be set on the client. |
setNumReduceTasks(int n) |
A core API used to specify the number of Reduce tasks in a MapReduce job. Only one Reduce task is started by default. You can also configure mapreduce.job.reduces in mapred-site.xml. The number of Reduce tasks is controlled by users. In most cases, the number of Reduce tasks is one-fourth the number of Map tasks. |
setQueueName(String queueName) |
Specifies the queue where a MapReduce job is submitted. The default queue is used by default. You can also configure mapreduce.job.queuename in mapred-site.xml. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot