Failed to Connect to ResourceManager When a Spark Task Is Submitted
Symptom
The connection to ResourceManager is abnormal, and Spark tasks fail to be submitted.
Cause Analysis
- The following error message is displayed on the driver, indicating that connections to port 26004 on the active and standby ResourceManager nodes are denied:
15/08/19 18:36:16 INFO RetryInvocationHandler: Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over 33 after 1 fail over attempts. Trying to fail over after sleeping for 17448ms. java.net.ConnectException: Call From ip0 to ip1:26004 failed on connection exception: java.net.ConnectException: Connection refused. INFO RetryInvocationHandler: Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over 32 after 2 fail over attempts. Trying to fail over after sleeping for 16233ms. java.net.ConnectException: Call From ip0 to ip2:26004 failed on connection exception: java.net.ConnectException: Connection refused;
- On MRS Manager, check whether ResourceManager is running properly, as shown in Figure 1. If Yarn is faulty or an unknown exception occurs on a Yarn service instance, the ResourceManager of the cluster may be abnormal.
- Check whether the client in the cluster is of the latest version.
Check whether the ResourceManager instance has been migrated in the cluster. (Uninstall a ResourceManager instance and add it back to other nodes.)
- On MRS Manager, click Audit to view audit logs and check whether related operations are recorded.
Run the ping command to check whether the IP address can be pinged.
Solution
- If ResourceManager is abnormal, see the Yarn-related sections to rectify the fault.
- If the client is not the latest, download the client again.
- If the IP address cannot be pinged, contact the network administrator to check the network.
- If HA is enabled for the cluster, set Yarn parameter yarn.client.failover-sleep-base-ms to a smaller value.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.