Updated on 2025-08-11 GMT+08:00

Interconnecting Flume with OBS Using an IAM Agency

After configuring decoupled storage and compute for a cluster by referring to Interconnecting an MRS Cluster with OBS Using an IAM Agency, you can run OBS jobs using Flume. This section describes how to submit a Flume job that reads a .txt file from the client node and exports the file to the OBS directory.

Notes and Constraints

This section applies to MRS 3.x or later.

Interconnecting Flume with OBS

  1. Create an OBS folder for storing data.

    1. Log in to the OBS console.
    2. In the navigation pane on the left, choose Resources > Parallel File Systems.
    3. On the displayed page, click the name of the parallel file system you created to access its details page.
    4. In the navigation pane on the left, choose Files. On the displayed page, click Create Folder to create the testFlumeOutput folder.

  2. Log in to the node where the Flume client is installed as user root.
  3. Create the /opt/flumeInput directory and create a customized .txt file in it.
  4. Add the following content to the Client installation directory/FusionInsight-flume-*/properties.properties file:

    # source
    server.sources = r1
    # channels
    server.channels = c1
    # sink
    server.sinks = obs_sink
    # ----- define net source -----
    server.sources.r1.type = seq
    server.sources.r1.spooldir = /opt/flumeInput
    # ---- define OBS sink ----
    server.sinks.obs_sink.type = hdfs
    server.sinks.obs_sink.hdfs.path = obs://esdk-c-test-pfs1/testFlumeOutput
    server.sinks.obs_sink.hdfs.filePrefix = %[localhost]
    server.sinks.obs_sink.hdfs.useLocalTimeStamp = true
    # set file size to trigger roll
    server.sinks.obs_sink.hdfs.rollSize = 0
    server.sinks.obs_sink.hdfs.rollCount = 0
    server.sinks.obs_sink.hdfs.rollInterval = 5
    #server.sinks.obs_sink.hdfs.threadsPoolSize = 30
    server.sinks.obs_sink.hdfs.fileType = DataStream
    server.sinks.obs_sink.hdfs.writeFormat = Text
    server.sinks.obs_sink.hdfs.fileCloseByEndEvent = false
    
    # define channel
    server.channels.c1.type = memory
    server.channels.c1.capacity = 1000
    # transaction size
    server.channels.c1.transactionCapacity = 1000
    server.channels.c1.byteCapacity = 800000
    server.channels.c1.byteCapacityBufferPercentage = 20
    server.channels.c1.keep-alive = 60
    server.sources.r1.channels = c1
    server.sinks.obs_sink.channel = c1
    • Set server.sources.r1.spooldir to the directory of the .txt file created in Step 3.
    • Set server.sinks.obs_sink.hdfs.path to the OBS file system created in Step 1.

  5. Copy the hadoop-huaweicloud-*.jar and mrs-obs-provider-*.jar files from Client installation directory/Hive/Beeline/lib to Flume client installation directory/fusionInsight-flume-*/lib. Then run the following commands to modify permissions:

    1. Go to the lib directory.
      cd Flume client installation directory/fusionInsight-flume-*/lib
    1. Modify the permission on hadoop-huaweicloud-*.jar.
      chmod 755 hadoop-huaweicloud-*.jar
    2. Modify the permission on mrs-obs-provider-*.jar.
      chmod 755 mrs-obs-provider-*.jar

  6. Run the following command to restart the Flume client:

    Go to the bin directory.

    cd Flume client installation directory/fusionInsight-flume-*/bin

    Restart the Flume client.

    ./flume-manager.sh restart

  7. View the result in the OBS system.

    1. Log in to the OBS console.
    2. In the navigation pane on the left, choose Resources > Parallel File Systems. Click the name of the parallel file system you created. In the navigation pane on the left, choose Files. On the displayed page, click the folder created in Step 1 to view the result.