Updated on 2024-09-23 GMT+08:00

Interconnecting Flume with OBS Using an IAM Agency

After configuring decoupled storage and compute for a cluster by referring to Interconnecting an MRS Cluster with OBS Using an IAM Agency, you can run OBS jobs using Flume.

This section applies to MRS 3.x or later.

Interconnecting Flume with OBS

  1. Create an OBS folder for storing data.

    1. Log in to the OBS console.
    2. In the navigation pane on the left, choose Resources > Parallel File Systems.
    3. On the displayed page, click the name of the parallel file system you created to access its details page.
    4. In the navigation pane on the left, choose Files. On the displayed page, click Create Folder to create the testFlumeOutput folder.

  2. Log in to the node where the Flume client is installed as user root.
  3. Create the /opt/flumeInput directory and create a customized .txt file in it.
  4. Add the following content to the Client installation directory/FusionInsight-flume-*/properties.properties file:

    # source
    server.sources = r1
    # channels
    server.channels = c1
    # sink
    server.sinks = obs_sink
    # ----- define net source -----
    server.sources.r1.type = seq
    server.sources.r1.spooldir = /opt/flumeInput
    # ---- define OBS sink ----
    server.sinks.obs_sink.type = hdfs
    server.sinks.obs_sink.hdfs.path = obs://esdk-c-test-pfs1/testFlumeOutput
    server.sinks.obs_sink.hdfs.filePrefix = %[localhost]
    server.sinks.obs_sink.hdfs.useLocalTimeStamp = true
    # set file size to trigger roll
    server.sinks.obs_sink.hdfs.rollSize = 0
    server.sinks.obs_sink.hdfs.rollCount = 0
    server.sinks.obs_sink.hdfs.rollInterval = 5
    #server.sinks.obs_sink.hdfs.threadsPoolSize = 30
    server.sinks.obs_sink.hdfs.fileType = DataStream
    server.sinks.obs_sink.hdfs.writeFormat = Text
    server.sinks.obs_sink.hdfs.fileCloseByEndEvent = false
    
    # define channel
    server.channels.c1.type = memory
    server.channels.c1.capacity = 1000
    # transaction size
    server.channels.c1.transactionCapacity = 1000
    server.channels.c1.byteCapacity = 800000
    server.channels.c1.byteCapacityBufferPercentage = 20
    server.channels.c1.keep-alive = 60
    server.sources.r1.channels = c1
    server.sinks.obs_sink.channel = c1
    • Set server.sources.r1.spooldir to the directory of the .txt file created in 3.
    • Set server.sinks.obs_sink.hdfs.path to the OBS file system created in 1.

  5. Copy the hadoop-huaweicloud-*.jar and mrs-obs-provider-*.jar files from Client installation directory/Hive/Beeline/lib to Flume client installation directory/fusionInsight-flume-*/lib. Then run the following commands to modify permissions:

    cd Flume client installation directory/fusionInsight-flume-*/lib

    chmod 755 hadoop-huaweicloud-*.jar

    chmod 755 mrs-obs-provider-*.jar

  6. Run the following command to restart the Flume client:

    cd Flume client installation directory/fusionInsight-flume-*/bin

    ./flume-manager.sh restart

  7. View the result in the OBS system.

    1. Log in to the OBS console.
    2. In the navigation pane on the left, choose Resources > Parallel File Systems. Click the name of the parallel file system you created. In the navigation pane on the left, choose Files. On the displayed page, click the folder created in 1 to view the result.