NodeManager OOM Occurs During Spark Application Execution
Question
When YARN' External Shuffle Service is enabled, if there are too many shuffle connections during Spark application execution, the "java.lang.OutofMemoryError: Direct buffer Memory" message is displayed. This indicates that the memory is insufficient. The error log is as follows:
2016-12-06 02:01:00,768 | WARN | shuffle-server-38 | Exception in connection from /192.168.101.95:53680 | TransportChannelHandler.java:79 io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:693) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) at io.netty.buffer.PoolArena.reallocate(PoolArena.java:277) at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:108) at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:146) ... 10 more
Answer
For YARN's External Shuffle Service, the number of started threads is twice the number of available vCPUs. However, the default direct buffer memory is 128 MB. Therefore, when a large number of shuffle connections are established at the same time, the direct buffer memory evenly allocated to each thread is low. For example, if a node has 40 vCPUs, the number of threads started by YARN's External Shuffle Service is 80, and the 80 threads share the direct buffer memory in the process. In this case, the memory allocated to each thread is less than 2 MB.
So, you are advised to adjust the value of direct buffer memory based on the number of vCPUs of the NodeManager node in the cluster. For example, if the number of vCPUs is 40, set direct buffer memory to 512 MB. That is, set the GC_OPTS parameter of the NodeManager node. For example:
-XX:MaxDirectMemorySize=512M
-XX:MaxDirectMemorySize is not used by default. You can add it to the GC_OPTS parameter as needed.
Perform the following operations to configure the parameter:
Log in to FusionInsight Manager and choose Cluster > Services > Yarn. Click Configurations then All Configurations, click NodeManager, and select System. Then, modify the configuration in the GC_OPTS parameter in the right pane.
Parameter |
Description |
Default Value |
---|---|---|
GC_OPTS |
GC parameter of YARN NodeManager |
128M |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot