Updated on 2023-05-06 GMT+08:00

Why Does a MapReduce Task Stay Unchanged for a Long Time?

Question

MapReduce job is not progressing for long time

Answer

This is because of less memory. When the memory is less, the time taken by the job to copy the map output increases significantly.

In order to reduce the waiting time, increase the heap memory.

The task configuration can be optimized based on the number of mappers and the data size of each mapper. Optimize the following parameters in the client installation path/Yarn/config/mapred-site.xml file based on the size of the input data:

  • mapreduce.reduce.memory.mb
  • mapreduce.reduce.java.opts

Example: If the data size is 5 GB with 10 mappers, then the ideal heap memory would be 1.5 GB. Increase the heap memory size according with the increase in data size.