Help Center/ MapReduce Service/ Component Operation Guide (LTS) (Ankara Region)/ Using HDFS/ FAQ/ How Do I Handle the Problem that HDFS Client Is Irresponsive When the NameNode Is Overloaded for a Long Time?
Updated on 2024-11-29 GMT+08:00

How Do I Handle the Problem that HDFS Client Is Irresponsive When the NameNode Is Overloaded for a Long Time?

Symptom

When the NameNode node is overloaded (100% of the CPU is occupied), the NameNode is irresponsive. The HDFS clients that are connected to the overloaded NameNode fail to respond. However, the HDFS clients that are newly connected to the NameNode will be switched to a backup NameNode and run properly.

Answer

When the preceding error occurs, the default configuration was used (as described in Table 1): the keep alive mechanism is enabled for the RPC connection between the HDFS client and the NameNode. The keep alive mechanism keeps the HDFS client waiting for server's responses and prevents the connection from being out timed, causing the irresponsive HDFS client.

Perform the following operations on the irresponsive HDFS client:

  • Leave the HDFS client waiting. Once the CPU usage of the node where NameNode locates drops, the NameNode will obtain CPU resources and the HDFS client will receive a response.
  • If you do not want to leave the HDFS client running, restart the application where the HDFS client locates to reconnect the HDFS client to another idle NameNode.

Solution:

To avoid this problem, add the following configurations to Client installation path/HDFS/hadoop/etc/hadoop/core-site.xml.

Table 1 Description

Parameter

Description

Default Value

ipc.client.ping

If this parameter is true, the HDFS client will wait for the response from the server and periodically send the ping message to avoid disconnection caused by tcp timeout.

If this parameter is false, the HDFS client will set the value of ipc.ping.interval as the timeout time. If no response is received within that time, timeout occurs.

To avoid the irresponsiveness of HDFS when the NameNode is overloaded for a long time, set this parameter to false.

true

ipc.ping.interval

If ipc.client.ping is true, this parameter indicates the interval between sending the ping messages.

If ipc.client.ping is false, this parameter indicates the connection timeout interval.

To avoid the irresponsiveness of HDFS when the NameNode is overloaded for a long time, you are advised to set this parameter to a large value, for example 900000 (ms) to avoid timeout when the server is busy.

60000