Optimizing the Commit Phase of MapReduce Tasks
Scenario
By default, if a MapReduce job generates a large number of output files, it takes a long time for the job to commit task temporary results to the final output directory in the last commit phase. In large clusters, the time-consuming commit process of jobs greatly affects the performance.
In this case, you can set the mapreduce.fileoutputcommitter.algorithm.version to 2 to improve the performance in the commit phase of MR jobs.
Procedure
Navigation path for setting parameters:
On the All Configurations page of the Yarn service, enter a parameter name in the search box. For details, see Modifying Cluster Service Configuration Parameters.
Parameter |
Description |
Default Value |
---|---|---|
mapreduce.fileoutputcommitter.algorithm.version |
Indicates the algorithm version submitted by a job. The value is 1 or 2.
NOTE:
2 is the recommended algorithm version. This algorithm enables tasks to directly commit the output results of each task to the final result output directory, reducing the time for the results of large jobs are committed. |
2 |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot