Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Troubleshooting/ Using Flink/ The yarn-session.sh Command Fails to Be Executed When the Flink Cluster Is Created
Updated on 2024-11-29 GMT+08:00

The yarn-session.sh Command Fails to Be Executed When the Flink Cluster Is Created

Symptom

During the creation of the Flink cluster, an error message is displayed after the yarn-session.sh command execution is suspended.

2018-09-20 22:51:16,842 | WARN  | [main] | Unable to get ClusterClient status from Application Client | org.apache.flink.yarn.YarnClusterClient (YarnClusterClient.java:253) 
org.apache.flink.util.FlinkException: Could not connect to the leading JobManager. Please check that the JobManager is running.
	at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:861)
	at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248)
	at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:516)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:717)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:514)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:511)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
	at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:511)
Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway.
	at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:79)
	at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:856)
	... 10 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]

Possible Causes

The SSL communication encryption is enabled for Flink, but no correct SSL certificate is configured.

Solution

Method 1:

Run the following command to disable the Flink SSL communication encryption, and modify the client configuration file conf/flink-conf.yaml.
security.ssl.enabled: false

Method 2:

Enable the Flink SSL communication encryption and retain the default value of security.ssl.enabled. Configure the SSL as follows:
  • If the KeyStore or TrustStore file is a relative path, and the Flink client directory where the command is executed can directly access this relative path.
    security.ssl.keystore: ssl/flink.keystore
    security.ssl.truststore: ssl/flink.truststore

    Add -t option to the CLI yarn-session.sh command of Flink to transmit the KeyStore and TrustStore files to each execution node. Example:

    yarn-session.sh -t ssl/ 2
  • If the keystore or truststore file path is an absolute path, the keystore or truststore files must exist in the absolute path on Flink Client and all nodes.
    security.ssl.keystore: /opt/Bigdata/client/Flink/flink/conf/flink.keystore
    security.ssl.truststore: /opt/Bigdata/client/Flink/flink/conf/flink.truststore