MapReduce任务运行失败,ApplicationMaster出现物理内存溢出异常
问题
HBase bulkload任务有210000个map和10000个reduce,MapReduce任务运行失败,ApplicationMaster出现物理内存溢出异常。
For more detailed output, check the application tracking page:https://bigdata-55:8090/cluster/app/application_1449841777199_0003 Then click on links to logs of each attempt. Diagnostics: Container [pid=21557,containerID=container_1449841777199_0003_02_000001] is running beyond physical memory limits Current usage: 1.0 GB of 1 GB physical memory used; 3.6 GB of 5 GB virtual memory used. Killing container. Dump of the process-tree for container_1449841777199_0003_02_000001 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 21584 21557 21557 21557 (java) 12342 1627 3871748096 271331 ${BIGDATA_HOME}/jdk1.8.0_51//bin/java -Djava.io.tmpdir=/srv/BigData/hadoop/data1/nm/localdir/usercache/hbase/appcache/application_1449841777199_0003/container_1449841777199_0003_02_000001/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/srv/BigData/hadoop/data1/nm/containerlogs/application_1449841777199_0003/container_1449841777199_0003_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Xmx784m org.apache.hadoop.mapreduce.v2.app.MRAppMaster |- 21557 21547 21557 21557 (bash) 0 0 13074432 368 /bin/bash -c ${BIGDATA_HOME}/jdk1.8.0_51//bin/java -Djava.io.tmpdir=/srv/BigData/hadoop/data1/nm/localdir/usercache/hbase/appcache/application_1449841777199_0003/container_1449841777199_0003_02_000001/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/srv/BigData/hadoop/data1/nm/containerlogs/application_1449841777199_0003/container_1449841777199_0003_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Xmx784m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/srv/BigData/hadoop/data1/nm/containerlogs/application_1449841777199_0003/container_1449841777199_0003_02_000001/stdout 2>/srv/BigData/hadoop/data1/nm/containerlogs/application_1449841777199_0003/container_1449841777199_0003_02_000001/stderr Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Failing this attempt. Failing the application.
回答
这是性能规格的问题,MapReduce任务运行失败的根本原因是由于ApplicationMaster的内存溢出导致的,即物理内存溢出导致被NodeManager kill。
解决方案:
将ApplicationMaster的内存配置调大,在客户端“客户端安装路径/Yarn/config/mapred-site.xml”配置文件中优化如下参数:
- “yarn.app.mapreduce.am.resource.mb”
- “yarn.app.mapreduce.am.command-opts”,该参数中-Xmx值建议为0.8*“yarn.app.mapreduce.am.resource.mb”
参考规格:
ApplicationMaster配置如下时,可以同时支持并发Container数为2.4万个。
- “yarn.app.mapreduce.am.resource.mb”=2048
- “yarn.app.mapreduce.am.command-opts”该参数中-Xmx=1638m