Updated on 2023-06-21 GMT+08:00

Developing an MRS Flink Job

This section describes how to develop an MRS Flink job on DataArts Factory. Use an MRS Flink job to count the number of words.

Prerequisites

  • You have the permission to access OBS paths.
  • MRS has been enabled and an MRS cluster has been created.

Data Preparation

  • Download the Flink job resource package wordcount.jar from https://github.com/huaweicloudDocs/dgc/blob/master/WordCount.jar.

    You must verify the integrity of the download Flink job resource package. In Windows, open the CLI and run the following command to generate the SHA-256 value of the downloaded JAR package. In the command, D:\wordcount.jar is an example local path and name of the JAR package. Replace it with the actual value.

    certutil -hashfile D:\wordcount.jar SHA256

    The following is an example command output:

    SHA-256 hash value of D:\wordcount.jar:
    0859965cb007c51f0d9ddaf7c964604eb27c39e2f1f56e082acb20c8eb05ccc4
    CertUtil: -hashfile command executed.

    Compare the SHA-256 value of the downloaded JAR package with that of the following JAR package: If they are the same, no tampering or packet loss occurred during the package download.

    SHA-256 value: 0859965cb007c51f0d9ddaf7c964604eb27c39e2f1f56e082acb20c8eb05ccc4

  • Prepare the data file in.txt, which contains some English words.

Procedure

  1. Upload the job resource package and data file to the OBS bucket.

    In this example, upload WordCount.jar to lkj_test/WordCount.jar and word.txt to lkj_test/input/word.txt.

  2. Create an empty job named job_MRS_Flink.

    Figure 1 Creating a job

  3. Go to the job development page, drag the MRS Flink node to the canvas, and click the node to configure its properties.

    Figure 2 Configuring properties for an MRS Flink node

    Parameter descriptions:

    --Flink job name
    wordcount
    --MRS cluster name
    Select an MRS cluster.
    --Program parameter
    -c    org.apache.flink.streaming.examples.wordcount.WordCount
    --Flink job resource package
    wordcount
    --Input data path
    obs://dlf-test/lkj_test/input/word.txt
    --Output data path
    obs://dlf-test/lkj_test/output.txt

    Specifically:

    obs://dlf-test/lkj_test/input/word.txt is the directory where the wordcount.jar parameters are passed. You can pass the words to count.

    obs://dlf-test/lkj_test/output.txt is the directory where the output parameter file is stored. (If the output.txt file already exists, an error is reported.)

  4. Click Test to execute the MRS Flink job.
  5. After the test is complete, click Submit.
  6. Choose Monitor Job in the navigation pane and view the job execution result.
  7. View the returned records in the OBS bucket. (Skip this step if the return function is not configured.)