Updated on 2024-09-23 GMT+08:00

Interconnecting Flink with OBS Using an IAM Agency

After configuring decoupled storage and compute for a cluster by referring to Interconnecting an MRS Cluster with OBS Using an IAM Agency, you can access the OBS parallel file system using the Flink client and run jobs.

Interconnecting Flink with OBS

  1. Log in to the Flink client installation node as the client installation user.
  2. Initialize environment variables.

    source Client installation directory/bigdata_env

  3. Configure the Flink client. For details, see Using Flink from Scratch.
  4. Start a session.

    • Normal cluster (Kerberos authentication disabled)

      yarn-session.sh -nm "session-name" -d

    • Security cluster (Kerberos authentication enabled)
      • If the paths of the flink.keystore and flink.truststore files are relative ones:

        Run the following command in the directory at the same level as ssl to start the session. ssl/ is a relative path.

        cd Client installation directory/Flink/flink/conf

        yarn-session.sh -t ssl/ -nm "session-name" -d

        ...
        Cluster started: Yarn cluster with application id application_1624937999496_0017
        JobManager Web Interface: http://192.168.1.150:32261
      • If the paths of the flink.keystore and flink.truststore files are absolute ones:

        Run the following command to start a session:

        cd Client installation directory/Flink/flink/conf

        yarn-session.sh -nm "session-name" -d

  5. Run the following command only on a security cluster with Kerberos authentication enabled to authenticate users:

    kinit Username

  6. Explicitly add the OBS file system to be accessed in the Flink command line.

    echo -e 'test' >/tmp/test

    hdfs dfs -mkdir -p obs://Parallel file system name/tmp/flinkjob

    hdfs dfs -put /tmp/test/ obs://Parallel file system name/tmp/flinkjob/

    flink run Client installation directory/Flink/flink/examples/batch/WordCount.jar -input obs://Parallel file system name/tmp/flinkjob/test -output obs://Parallel file system name/tmp/flinkjob/output

Before interconnecting Flink with OBS, ensure that YARN is connected to OBS as Flink jobs run on YARN.