更新时间:2023-12-22 GMT+08:00
Flink客户端执行命令报错“Could not connect to the leading JobManager”
问题背景与现象
创建Fllink集群,执行yarn-session.sh命令卡住一段时间后报错:
2018-09-20 22:51:16,842 | WARN | [main] | Unable to get ClusterClient status from Application Client | org.apache.flink.yarn.YarnClusterClient (YarnClusterClient.java:253) org.apache.flink.util.FlinkException: Could not connect to the leading JobManager. Please check that the JobManager is running. at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:861) at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248) at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:516) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:717) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:514) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:511) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:511) Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway. at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:79) at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:856) ... 10 common frames omitted Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]
可能原因
Flink开启了SSL通信加密,却没有正确的配置SSL证书。
解决办法
针对MRS 2.x及之前版本,操作如下:
方法1:
关闭Flink SSL通信加密,修改客户端配置文件“conf/flink-conf.yaml”。
security.ssl.internal.enabled: false
方法2:
开启Flink SSL通信加密,security.ssl.internal.enabled保持默认。
正确配置SSL:
- 配置keystore或truststore文件路径为相对路径时,Flink Client执行命令的目录需要可以直接访问该相对路径。
security.ssl.internal.keystore: ssl/flink.keystore security.ssl.internal.truststore: ssl/flink.truststore
在Flink的CLI yarn-session.sh命令中增加“-t”选项来传输keystore和truststore文件到各个执行节点。
yarn-session.sh -t ssl/ 2
- 配置keystore或truststore文件路径为绝对路径时,需要在Flink Client以及各个节点的该绝对路径上放置keystore或truststore文件。
security.ssl.internal.keystore: /opt/client/Flink/flink/conf/flink.keystore security.ssl.internal.truststore: /opt/client/Flink/flink/conf/flink.truststore
针对MRS3.x及之后版本,操作如下:
方法1:
关闭Flink SSL通信加密,修改客户端配置文件“conf/flink-conf.yaml”。
security.ssl.enabled: false
方法2:
开启Flink SSL通信加密,security.ssl.enabled 保持默认。
正确配置SSL:
- 配置keystore或truststore文件路径为相对路径时,Flink Client执行命令的目录需要可以直接访问该相对路径。
security.ssl.keystore: ssl/flink.keystore security.ssl.truststore: ssl/flink.truststore
在Flink的CLI yarn-session.sh命令中增加“-t”选项来传输keystore和truststore文件到各个执行节点。
yarn-session.sh -t ssl/ 2
- 配置keystore或truststore文件路径为绝对路径时,需要在Flink Client以及各个节点的该绝对路径上放置keystore或truststore文件。
security.ssl.keystore: /opt/client/Flink/flink/conf/flink.keystore security.ssl.truststore: /opt/client/Flink/flink/conf/flink.truststore
父主题: 使用Flink