Help Center/ MapReduce Service/ Component Operation Guide (LTS) (Ankara Region)/ Using Spark/ Spark FAQ/ Spark Core/ NodeManager OOM Occurs During Spark Application Execution
Updated on 2024-11-29 GMT+08:00

NodeManager OOM Occurs During Spark Application Execution

Question

When YARN' External Shuffle Service is enabled, if there are too many shuffle connections during Spark application execution, the "java.lang.OutofMemoryError: Direct buffer Memory" message is displayed. This indicates that the memory is insufficient. The error log is as follows:

2016-12-06 02:01:00,768 | WARN  | shuffle-server-38 | Exception in connection from /192.168.101.95:53680 | TransportChannelHandler.java:79
io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:153)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:693)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
        at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434)
        at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:168)
        at io.netty.buffer.PoolArena.reallocate(PoolArena.java:277)
        at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:108)
        at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:146)
        ... 10 more

Answer

For YARN's External Shuffle Service, the number of started threads is twice the number of available vCPUs. However, the default direct buffer memory is 128 MB. Therefore, when a large number of shuffle connections are established at the same time, the direct buffer memory evenly allocated to each thread is low. For example, if a node has 40 vCPUs, the number of threads started by YARN's External Shuffle Service is 80, and the 80 threads share the direct buffer memory in the process. In this case, the memory allocated to each thread is less than 2 MB.

So, you are advised to adjust the value of direct buffer memory based on the number of vCPUs of the NodeManager node in the cluster. For example, if the number of vCPUs is 40, set direct buffer memory to 512 MB. That is, set the GC_OPTS parameter of the NodeManager node. For example:

-XX:MaxDirectMemorySize=512M

-XX:MaxDirectMemorySize is not used by default. You can add it to the GC_OPTS parameter as needed.

Perform the following operations to configure the parameter:

Log in to FusionInsight Manager and choose Cluster > Services > Yarn. Click Configurations then All Configurations, click NodeManager, and select System. Then, modify the configuration in the GC_OPTS parameter in the right pane.

Table 1 Parameters

Parameter

Description

Default Value

GC_OPTS

GC parameter of YARN NodeManager

128M