Updated on 2024-03-05 GMT+08:00

Customizing a Flink Streaming Job

Obtaining a DIS Flink Connector Demo

  1. Obtain the dis-flink-connector-X.X.X.zip package from https://dis-publish.obs-website.cn-north-1.myhuaweicloud.com/. Decompress the dis-flink-connector-X.X.X.zip package to obtain the following directory that contains a Maven project sample:

    • huaweicloud-dis-flink-connector-demo which contains a sample Maven project

Importing the Demo Project on Intellij IDEA

The following uses the IntelliJ IDEA community version as an example to describe how to compile a Flink job. Make sure that the following components have been configured on IDEA.

  • JDK 1.8+
  • Scala-sdk-2.11
  • Maven 3.3.*
  1. Open IntelliJIDEA and choose File > New > Project from Existing Sources.... Select the huaweicloud-dis-flink-connector-demo directory and click OK.

  2. Select a Maven project, retain the default settings, and click Next until the New Project dialog box is displayed.

  3. Click New Window to open the project in a new window.

  4. Right-click pom.xml and choose Maven > Reimport from the shortcut menu to import the maven dependency library again.

Verifying the Flink Streaming Source Job

This section describes how to test a Flink Streaming job in the local IDE to understand the basic usage of the SDK. In a real-world scenario, the Flink Streaming job needs to run on a Flink cluster. After the test is complete, you can create clusters (such as MRS clusters) and submit a job for verification.

  1. Use the account to log in to the DIS console.
  2. Click in the upper left corner of the page and select a region and project.
  3. Create a DIS stream by referring to Step 1: Creating a DIS Stream and continuously upload data to the newly created DIS stream. In this example, the content to be uploaded is hello world.
  4. Open the pom.xml file, press Ctrl+/ to comment out the <scope>provided</scope> row, and save the setting.

  5. Right-click pom.xml and choose Maven > Reimport from the shortcut menu to import the dependency package again.

  6. Right-click in the DISFlinkStreamingSourceJavaExample file and choose Create 'DISFlinkStreamingSourceJavaExample' from the shortcut menu.

  7. On the configuration page that is displayed, set Program arguments in the following format:

    DIS gateway address + Region name + AK + SK + Project ID + Stream name + Start position + Consumer ID
     
    https://dis.${region}.myhuaweicloud.com ${region} YOU_AK YOU_SK YOU_PROJECTID YOU_STREAM_NAME latest GROUP_ID

    The parameter sequence and meaning are available in the sample code. For details, see the following information:

    //DIS endpoint.
            String endpoint;
    //ID of the region where DIS resides.
            String region;
    //AK of the user.
            String ak;
    //SK of the user.
            String sk;
    //Project ID of the user.
            String projectId;
    //DIS stream name.
            String streamName;
    //Consumption policy. This policy is used only when the partition has no checkpoint or the checkpoint has expired. If a valid checkpoint exists, the consumption continues from this checkpoint.
    //When the policy is set to LATEST, the consumption starts from the latest data. This policy will ignore the existing data in the stream.
    //When the policy is set to Earliest, the consumption starts from the earliest data. This policy will obtain all valid data in the stream.
            String startingOffsets;
    //Consumer group ID. Different clients in the same consumer group can consume the same stream at the same time.
            String groupId;
    • A checkpoint must be specified or a consumption point is automatically marked as follows:

    disConfig.put(DisConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");

    disConfig.put(DisConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");

    disConfig.put(DisConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "LATEST"); //LATEST indicates that the consumption starts from the latest data.

    • If none of them is set, the consumption starts from the latest by default.

    The final configurations of the IDEA are shown in the following figure. After confirming that all configurations are correct, click OK to close the window.

  8. Right-click in the DISFlinkStreamingSourceJavaExample file and choose Run 'DISFlinkStreamingSourceJavaExample' from the shortcut menu to start the job.

  9. If no error occurs, data is read from DIS and output to the console. The following is an example:

    2> hello world
    2> hello world
    2> hello world

  10. After verifying that the job can run locally without error, remove the comment tag from the <scope>provided</scope> row in pom.xml to prevent the Flink dependency from being packaged. Then stop the data upload program.

Verifying the Flink Streaming Sink Job

This section describes how to test a Flink job in the local IDE to understand the basic usage of the SDK. In a real-world scenario, the Flink job needs to run on a Flink cluster. After the test is complete, you can create clusters (such as MRS clusters) and submit a job for verification.

  1. Use the account to log in to the DIS console.
  2. Click in the upper left corner of the page and select a region and project.
  3. Enable DIS by referring to Step 1: Creating a DIS Stream.
  4. Open the pom.xml file, press Ctrl+/ to comment out the <scope>provided</scope> row, and save the setting.

  5. Right-click pom.xml and choose Maven > Reimport from the shortcut menu to import the dependency package again.

  6. Right-click in the DISFlinkStreamingSinkJavaExample file and choose Create 'DISFlinkStreamingSinkJavaExample' from the shortcut menu.

  7. On the configuration page that is displayed, set Program arguments in the following format:

    DIS gateway address + Region name + AK + SK + Project ID + Stream name
    https://dis.${region}.myhuaweicloud.com ${region} YOU_AK YOU_SK YOU_PROJECTID YOU_STREAM_NAME

    The parameter sequence and meaning are available in the sample code. For details, see the following information:

    //DIS endpoint.
            String endpoint;
    //ID of the region where DIS resides.
            String region;
    //AK of the user.
            String ak;
    //SK of the user.
            String sk;
    //Project ID of the user.
            String projectId;
    //DIS stream name.
            String streamName;

    The final configurations of the IDEA are shown in the following figure. After confirming that all configurations are correct, click OK to close the window.

  8. Right-click in the DISFlinkStreamingSinkJavaExample file and choose Run 'DISFlinkStreamingSinkJavaExample' from the shortcut menu to start the job.

  9. Check whether the data is successfully uploaded on the stream monitoring page of the DIS console.
  10. After verifying that the job can run locally without error, remove the comment tag from the <scope>provided</scope> row in pom.xml to prevent the Flink dependency from being packaged. Then stop the data upload program.