Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HDFS/ Common Issues About HDFS/ What Should I Do If the HDFS Client Is Irresponsive When the NameNode Is Overloaded for a Long Time?
Updated on 2024-10-24 GMT+08:00

What Should I Do If the HDFS Client Is Irresponsive When the NameNode Is Overloaded for a Long Time?

Question

When the NameNode node is overloaded (100% of the CPU is occupied), the NameNode is irresponsive. The HDFS clients that are connected to the overloaded NameNode fail to respond and further operations cannot be performed. However, the HDFS clients that are newly connected to the NameNode will be switched to a backup NameNode and run properly.

Answer

When the preceding error occurs, the default configuration was used (as described in Table 1): the keep alive mechanism is enabled for the RPC connection between the HDFS client and the NameNode. The keep alive mechanism keeps the HDFS client waiting for server's responses and prevents the connection from being out timed, causing the irresponsive HDFS client.

In this case, you can:

  • Leave the HDFS client waiting. Once the CPU usage of the node where NameNode locates drops, the NameNode will obtain CPU resources and the HDFS client will receive a response.
  • If you do not want to leave the HDFS client running, restart the application where the HDFS client locates to reconnect the HDFS client to another idle NameNode.

Solution:

To avoid this problem, add the following configurations to Client installation path/HDFS/hadoop/etc/hadoop/core-site.xml.

Table 1 Parameters

Parameter

Description

Default Value

ipc.client.ping

If this parameter is true, the HDFS client will wait for the response from the server and periodically send the ping message to avoid disconnection caused by tcp timeout.

If this parameter is false, the HDFS client will set the value of ipc.ping.interval as the timeout time. If no response is received within that time, timeout occurs.

To avoid the irresponsiveness of HDFS when the NameNode is overloaded for a long time, set this parameter to false.

true

ipc.ping.interval

If ipc.client.ping is true, this parameter indicates the interval between sending the ping messages.

If ipc.client.ping is false, this parameter indicates the connection timeout interval.

To avoid the irresponsiveness of HDFS when the NameNode is overloaded for a long time, you are advised to set this parameter to a large value, for example 900000 (ms) to avoid timeout when the server is busy.

60000