What Should I Do If the HDFS Client Is Irresponsive When the NameNode Is Overloaded for a Long Time?
Question
When the NameNode node is overloaded (100% of the CPU is occupied), the NameNode is irresponsive. The HDFS clients that are connected to the overloaded NameNode fail to respond and further operations cannot be performed. However, the HDFS clients that are newly connected to the NameNode will be switched to a backup NameNode and run properly.
Answer
When the preceding error occurs, the default configuration was used (as described in Table 1): the keep alive mechanism is enabled for the RPC connection between the HDFS client and the NameNode. The keep alive mechanism keeps the HDFS client waiting for server's responses and prevents the connection from being out timed, causing the irresponsive HDFS client.
In this case, you can:
- Leave the HDFS client waiting. Once the CPU usage of the node where NameNode locates drops, the NameNode will obtain CPU resources and the HDFS client will receive a response.
- If you do not want to leave the HDFS client running, restart the application where the HDFS client locates to reconnect the HDFS client to another idle NameNode.
Solution:
To avoid this problem, add the following configurations to Client installation path/HDFS/hadoop/etc/hadoop/core-site.xml.
Parameter |
Description |
Default Value |
---|---|---|
ipc.client.ping |
If this parameter is true, the HDFS client will wait for the response from the server and periodically send the ping message to avoid disconnection caused by tcp timeout. If this parameter is false, the HDFS client will set the value of ipc.ping.interval as the timeout time. If no response is received within that time, timeout occurs. To avoid the irresponsiveness of HDFS when the NameNode is overloaded for a long time, set this parameter to false. |
true |
ipc.ping.interval |
If ipc.client.ping is true, this parameter indicates the interval between sending the ping messages. If ipc.client.ping is false, this parameter indicates the connection timeout interval. To avoid the irresponsiveness of HDFS when the NameNode is overloaded for a long time, you are advised to set this parameter to a large value, for example 900000 (ms) to avoid timeout when the server is busy. |
60000 |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot