Help Center/ MapReduce Service/ Troubleshooting/ Using Flink/ Error Message "Could Not Connect to the Leading JobManager" Is Displayed When a Command Is Executed on the Flink Client
Updated on 2024-12-18 GMT+08:00

Error Message "Could Not Connect to the Leading JobManager" Is Displayed When a Command Is Executed on the Flink Client

Symptom

During the creation of the Flink cluster, the following error message is displayed after the yarn-session.sh command execution is suspended for a while:

2018-09-20 22:51:16,842 | WARN  | [main] | Unable to get ClusterClient status from Application Client | org.apache.flink.yarn.YarnClusterClient (YarnClusterClient.java:253) 
org.apache.flink.util.FlinkException: Could not connect to the leading JobManager. Please check that the JobManager is running.
	at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:861)
	at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248)
	at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:516)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:717)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:514)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:511)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:511)
Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway.
	at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:79)
	at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:856)
	... 10 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

Possible Causes

The SSL communication encryption is enabled for Flink, but no correct SSL certificate is configured.

Solution

For MRS 2.x or earlier, perform the following operations:

Method 1:

Run the following command to disable the Flink SSL communication encryption, and modify the client configuration file conf/flink-conf.yaml.
security.ssl.internal.enabled: false

Method 2:

Enable the Flink SSL communication encryption and retain the default value of security.ssl.internal.enabled.

Configure the SSL as follows:
  • If the keystore or truststore file path is a relative path, allow the Flink client directory where the command is executed to access this relative path directly.
    security.ssl.internal.keystore: ssl/flink.keystore
    security.ssl.internal.truststore: ssl/flink.truststore

    Add -t option to the CLI yarn-session.sh command of Flink to transmit the KeyStore and TrustStore files to each execution node.

    yarn-session.sh -t ssl/ 2

  • If the keystore or truststore file path is an absolute path, the keystore or truststore files must exist in the absolute path on Flink Client and all nodes.
    security.ssl.internal.keystore: /opt/client/Flink/flink/conf/flink.keystore
    security.ssl.internal.truststore: /opt/client/Flink/flink/conf/flink.truststore

For MRS 3.x or later, perform the following operations:

Method 1:

Run the following command to disable the Flink SSL communication encryption, and modify the client configuration file conf/flink-conf.yaml.
security.ssl.enabled: false

Method 2:

Enable the Flink SSL communication encryption and retain the default value of security.ssl.enabled.

Configure the SSL as follows:
  • If the keystore or truststore file path is a relative path, allow the Flink client directory where the command is executed to access this relative path directly.
    security.ssl.keystore: ssl/flink.keystore
    security.ssl.truststore: ssl/flink.truststore

    Add -t option to the CLI yarn-session.sh command of Flink to transmit the KeyStore and TrustStore files to each execution node.

    yarn-session.sh -t ssl/ 2
  • If the keystore or truststore file path is an absolute path, the keystore or truststore files must exist in the absolute path on Flink Client and all nodes.
    security.ssl.keystore: /opt/client/Flink/flink/conf/flink.keystore
    security.ssl.truststore: /opt/client/Flink/flink/conf/flink.truststore