Updated on 2024-01-17 GMT+08:00

Interconnecting Flink with OBS

Before performing the following operations, ensure that you have configured a storage-compute decoupled cluster by referring to Configuring a Storage-Compute Decoupled Cluster (Agency).

  1. Log in to the Flink client installation node as the client installation user.
  2. Run the following command to initialize environment variables:

    source Client installation directory/bigdata_env

  3. Configure the Flink client. For details, see Using Flink from Scratch.
  4. Start a session.

    • Normal cluster (Kerberos authentication disabled)

      yarn-session.sh -nm "session-name" -d

    • Security cluster (Kerberos authentication enabled)
      • If the flink.keystore and flink.truststore file paths are relative paths:

        Run the following command in the directory at the same level as ssl to start the session. ssl/ is a relative path.

        cd /opt/hadoopclient/Flink/flink/conf/

        yarn-session.sh -t ssl/ -nm "session-name" -d

        ...
        Cluster started: Yarn cluster with application id application_1624937999496_0017
        JobManager Web Interface: http://192.168.1.150:32261
      • If the flink.keystore and flink.truststore file paths are absolute paths:

        Run the following command to start a session:

        cd /opt/hadoopclient/Flink/flink/conf/

        yarn-session.sh -nm "session-name" -d

  5. For a security cluster, run the following command to perform user authentication. If Kerberos authentication is not enabled for the current cluster, you do not need to run this command.

    kinit Username

  6. Explicitly add the OBS file system to be accessed in the Flink command line.

    echo -e 'test' >/tmp/test

    hdfs dfs -mkdir -p obs://Parallel file system name/tmp/flinkjob

    hdfs dfs -put /tmp/test/ obs://Parallel file system name/tmp/flinkjob/

    flink run Client installation directory/Flink/flink/examples/batch/WordCount.jar -input obs://Parallel file system name/tmp/flinkjob/test -output obs://Parallel file system name/tmp/flinkjob/output

Flink jobs are running on Yarn. Before configuring Flink to interconnect with the OBS file system, ensure that the interconnection between Yarn and the OBS file system is normal.