Help Center/ MapReduce Service/ Developer Guide (Normal_Earlier Than 3.x)/ Obtaining the MRS Application Development Sample Project
Updated on 2024-08-16 GMT+08:00

Obtaining the MRS Application Development Sample Project

Process for Building an MRS Sample Project

The MRS sample project construction process consists of the following steps:

  1. Download the Maven project source code and configuration file of the sample project. For details, see.Obtaining the MRS Sample Project.
  2. For details about how to configure the Maven image repository of the SDK in Huawei image site, see.Configuring Huawei Open Source Image Repository.
  3. This section describes how to build a complete Maven project and compile and develop it based on user requirements.

Obtaining the MRS Sample Project

You can download the MRS sample project fromhttps://github.com/huaweicloud/huaweicloud-mrs-example.

Switch the branch to the version branch matching the MRS cluster. , for example, mrs-3.2.0.1, download the package to the local host and decompress it to obtain the sample code project corresponding to each component.

Figure 1 Downloading MRS Sample Project Code

You can download the sample project corresponding to the MRS LTS version from the following website:

You can download the sample project of the common MRS version from the following website:

Configuring Huawei Open Source Image Repository

Huawei provides Huawei Mirrors for you to download all dependency JAR files of sample projects. However, you need to download the rest dependency open-source JAR files from the Maven central repository or other custom repository address.

Before using the development tool to download the dependent JAR package in the local environment, ensure that the following information is available:

  • The local network is normal.

    Uses a browser and visit Huawei Mirrors to check whether the website can be accessed. If the access is abnormal, connect the local network.

  • The proxy is disabled for the development tool.

    Take the IntelliJ IDEA development tool of version 2020.2 as an example. Choose File > Settings > Appearance & Behavior > System Settings > HTTP Proxy, select No proxy, and click OK to save the configuration.

The configuration method of an open source image is as follows:

  1. Ensure that JDK 1.8 or later and Maven 3.0 or later have been installed.
  2. Download the settings.xml file provided by Huawei Mirrors, and overwrite the <Maven installation directory>/conf/settings.xml file with the downloaded file.

    If the file cannot be downloaded, search for HuaweiCloud SDK at Huawei Mirrors, click HuaweiCloud SDK, and perform operations as prompted.

  3. If you do not want to overwrite the Maven configuration file, you can manually modify the settings.xml configuration file or the pom.xml file of the component sample project to configure the mirror repository address. The configuration methods are as follows:

    • Configuration method 1

      Add the following open source mirror repository address to mirrors in the settings.xml configuration file.

      <mirror>
          <id>repo2</id>
          <mirrorOf>central</mirrorOf>
          <url>https://repo1.maven.org/maven2/</url>
      </mirror>

      Add the following mirror repository address to profiles in the settings.xml configuration file.

      <profile>
          <id>huaweicloudsdk</id>
          <repositories>
              <repository>
                  <id>huaweicloudsdk</id>
                  <url>https://repo.huaweicloud.com/repository/maven/huaweicloudsdk/</url>
                  <releases><enabled>true</enabled></releases>
                  <snapshots><enabled>true</enabled></snapshots>
              </repository>
          </repositories>
      </profile>

      Add the following mirror repository address to the activeProfiles node in the settings.xml file.

      <activeProfile>huaweicloudsdk</activeProfile>
      • Huawei Mirrors does not provide third-party open source JAR files. After configuring Huawei open source mirrors, you need to separately configure third-party Maven mirror repository address.
      • When using the IntelliJ IDEA development tool, you can choose File > Settings > Build, Execution, Deployment > Build Tools > Maven to view the directory where the settings.xml file is stored.

    • Configuration method 2

      Add the following mirror repository address directly to the pom.xml file in the secondary development sample project.

          <repositories>
      
              <repository>
                  <id>huaweicloudsdk</id>
                  <url>https://mirrors.huaweicloud.com/repository/maven/huaweicloudsdk/</url>
                  <releases><enabled>true</enabled></releases>
                  <snapshots><enabled>true</enabled></snapshots>
              </repository>
      
              <repository>
                  <id>central</id>
                  <name>Maven Central</name>
                  <url>https://repo1.maven.org/maven2/</url>
              </repository>
      
          </repositories>

  4. Configure the default Maven encoding and JDK. Add the following information to the profiles node in the settings.xml configuration file:

    <profile>
    <id>JDK1.8</id>
    <activation>
    <activeByDefault>true</activeByDefault>
    <jdk>1.8</jdk>
    </activation>
    <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    <maven.compiler.encoding>UTF-8</maven.compiler.encoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <maven.compiler.compilerVersion>1.8</maven.compiler.compilerVersion>
    </properties>
    </profile>

Sample projects for MRS components

The MRS sample code library provides sample projects for basic functions of each component. For details about the sample projects provided by each component in the current version, see.Table 1.

Table 1 Sample projects of each component

Component

Sample Project Location

Description

ClickHouse

clickhouse-examples

Java program that creates and deletes ClickHouse data tables, and inserts and queries data in MRS clusters

This program establishes server connections, creates databases and data tables, inserts data, queries data, and deletes data tables.

ClickHouseJDBC-Transaction-JavaExample

Example code for ClickHouse transactions, which is available for MRS 3.3.0 and later versions.

Doris

doris-examples/doris-jdbc-example

Application development example for Doris data reads/writes, which is available for MRS 3.3.0 and later versions

This example calls Doris APIs to create user tables, insert data, query data, and delete tables.

Flink

  • If Kerberos authentication is enabled for the cluster, the sample project directory is flink-examples/flink-examples-security.
  • If Kerberos authentication is disabled for the cluster, the sample project directory is flink-examples/flink-examples-normal.

FlinkCheckpointJavaExample

Java/Scala program for Flink asynchronous checkpointing

In this project, the program uses custom operators to continuously generate data. The generated data is a quadruple of long, string, string, and integer values. The program collects statistic results and displays them on the terminal. A checkpoint is triggered every other 6 seconds and the checkpoint result is stored in HDFS.

FlinkCheckpointScalaExample

FlinkHBaseJavaExample

Java sample program that calls Flink APIs in a job to read and write HBase data.

This is only supported by MRS 3.2.0 and later versions.

FlinkKafkaJavaExample

Java/Scala program that uses a Flink job to produce and consume data from Kafka.

In this project, assume that a Flink service receives one message per second. The Producer application sends data to Kafka, the Consumer application receives data from Kafka, and the program processes and prints the data.

FlinkKafkaScalaExample

FlinkPipelineJavaExample

Java/Scala program for Flink job pipeline

In this example, a publisher job generates 10,000 data records per second, and the other two jobs subscribe to the data, respectively. After receiving the data, the subscriber jobs convert data formats, sample the data, and output the samples.

FlinkPipelineScalaExample

FlinkSqlJavaExample

SQL job submission through Jar jobs on the client

FlinkStreamJavaExample

Java/Scale program for constructing DataStream with Flink

This program analyzes user log data based on service requirements, reads text data, generates DataStreams, filters data that meets specified conditions, and obtains results.

FlinkStreamScalaExample

FlinkStreamSqlJoinExample

Flink SQL Join program

This program calls APIs of the flink-connector-kafka module to produce and consume data. It generates Table1 and Table2, uses Flink SQL to perform joint query on the tables, and displays results.

FlinkRESTAPIJavaExample

Java program that calls FlinkServer restful APIs to create tenants

flink-examples/flink-sql

Sample program that uses Flink Jar to submit a SQL job

flink-examples/pyflink-example

pyflink-kafka

Python program that submits a regular job to read and write Kafka data

pyflink-sql

Python program that submits a SQL job

HBase

hbase-examples

hbase-example

Application development example for HBase reads/writes and global secondary indexes. HBase APIs can be called to:

  • Create user tables, import user data, add and query user information, and create secondary indexes for user tables.
  • In MRS 3.3.0 and later versions, create and delete global secondary indexes, modify the status of global secondary indexes, and query global secondary indexes.

hbase-rest-example

A development example for using HBase REST interfaces.

This program uses REST APIs to query HBase cluster information, obtain tables, use NameSpaces, and manipulate tables.

hbase-thrift-example

A development example for accessing HBase ThriftServer.

This program accesses ThriftServer to manipulate tables, and write data to and read data from tables.

hbase-zk-example

A development example for HBase to access ZooKeeper.

You can use the same client process to access MRS ZooKeeper and third-party ZooKeeper at the same time. The HBase client accesses MRS ZooKeeper, and the customer application accesses third-party ZooKeeper.

HDFS

  • If Kerberos authentication is enabled for the cluster, the sample project directory is hdfs-example-security.
  • If Kerberos authentication is disabled for the cluster, the sample project directory is hdfs-example-normal.

Java program for HDFS file operations.

This program creates HDFS folders, writs files, appends file content, reads files, and deletes files or folders.

hdfs-c-example

A C language development example for using HDFS.

This program connects the HDFS file system and implements file operation functions, such as creating, reading, writing, appending, and deleting files.

HetuEngine

  • If Kerberos authentication is enabled for the cluster, the sample project directory is hetu-examples/hetu-examples-security.
  • If Kerberos authentication is disabled for the cluster, the sample project directory is hetu-examples/hetu-examples-normal.

Java/Python program for connecting to HetuEngine in different ways

In this example project, you can use the username and password to connect to HetuEngine through ZooKeeper or HSBroker, or use the KeyTab authentication file to connect to HetuEngine, and send SQL statements to HetuEngine to add, delete, modify, and query Hive data.

Hive

hive-examples

hive-jdbc-example

Java program for Hive JDBC to process data

In this project, JDBC APIs are used to connect Hive and perform data operations. You can use JDBC APIs to create tables, load data, and query data. You can access FusionInsight ZooKeeper and third-party ZooKeeper in the same client process at the same time.

hive-jdbc-example-multizk

hcatalog-example

Java program for Hive HCatalog to process data

HCatalog APIs are used to define and query MRS Hive metadata with Hive CLI.

python-examples

Python program to connect to Hive and execute SQL examples.

This program uses Python to connect Hive and submits data analysis tasks.

python3-examples

Python 3 program to connect Hive and execute SQL statements.

This program uses Python 3 to connect Hive and submits data analysis tasks.

IoTDB

iotdb-examples

iotdb-flink-example

Program for using Flink to access IoTDB data, including FlinkIoTDBSink and FlinkIoTDBSource data.

FlinkIoTDBSink can use Flink jobs to write time series data to IoTDB. FlinkIoTDBSource reads time series data from IoTDB through Flink jobs and prints the data.

iotdb-jdbc-example

Java sample program for IoTDB JDBC to process data.

This program demonstrates how to use JDBC APIs to connect IoTDB, and executes IoTDB SQL statements.

iotdb-kafka-example

Sample program for accessing IoTDB data through Kafka.

This program demonstrates how to send time series data to Kafka and then use multiple threads to write the data to IoTDB.

iotdb-session-example

Java sample program for IoTDB Session to process data.

This program demonstrates how to use Session to connect IoTDB, and executes IoTDB SQL statements.

iotdb-udf-exmaple

This program demonstrates how to implement a simple IoTDB user-defined function (UDF).

Kafka

kafka-examples

Java program for processing Kafka streaming data

The program is developed based on Kafka Streams to count words in each message by reading messages in the input topic and to output the result in key-value pairs by consuming data in the output topic.

Manager

manager-examples

Program for calling FusionInsight Manager APIs

This program calls Manager APIs to create, modify, and delete cluster users.

MapReduce

  • If Kerberos authentication is enabled for the cluster, the sample project directory is mapreduce-example-security.
  • If Kerberos authentication is disabled for the cluster, the sample project directory is mapreduce-example-normal.

Java program for submitting MapReduce jobs

This program runs a MapReduce statistics data job to analyze and process data and output data required by users.

It illustrates how to write MapReduce jobs to access multiple service components in HDFS, HBase, and Hive, helping you to develop for key operations such as authentication and configuration loading.

Oozie

  • If Kerberos authentication is enabled for the cluster, the sample project directory is oozie-examples/ooziesecurity-examples.
  • If Kerberos authentication is disabled for the cluster, the sample project directory is oozie-examples/oozienormal-examples.

OozieMapReduceExample

Program for submitting MapReduce jobs with Oozie.

This program demonstrates how to use Java APIs to submit MapReduce jobs, query job status, and perform offline analysis on website log files.

OozieSparkHBaseExample

Program for using Oozie to schedule Spark jobs to access HBase.

OozieSparkHiveExample

Program for using Oozie to schedule Spark jobs to access Hive.

Spark

  • If Kerberos authentication is enabled for the cluster, the sample project directory is spark-examples/sparksecurity-examples.
  • If Kerberos authentication is disabled for the cluster, the sample project directory is spark-examples/sparknormal-examples.

SparkHbasetoCarbonJavaExample

Java program for Spark to synchronize HBase data to CarbonData.

In this project, the program writes data to HBase in real time for point queries. Data is synchronized to CarbonData tables in batches at a specified interval for analytical queries.

SparkHbasetoHbaseJavaExample

Java/Scala/Python program that uses Spark to read data from and then write data to HBase

The program uses Spark jobs to analyze and summarize data of two HBase tables.

SparkHbasetoHbasePythonExample

SparkHbasetoHbaseScalaExample

SparkHivetoHbaseJavaExample

Java/Scala/Python program that uses Spark to read data from Hive and then write data to HBase

The program uses Spark jobs to analyze and summarize data of a Hive table and write result to an HBase table.

SparkHivetoHbasePythonExample

SparkHivetoHbaseScalaExample

SparkJavaExample

Java/Python/Scala/R program of Spark Core tasks

The program reads text data from HDFS and then calculates and analyzes the data.

SparkRExample is only available for clusters with Kerberos authentication enabled.

SparkPythonExample

SparkScalaExample

SparkRExample

SparkLauncherJavaExample

Java/Scala program that uses Spark Launcher to submit jobs

The program uses the org.apache.spark.launcher.SparkLauncher class through Java/Scala commands to submit Spark jobs.

SparkLauncherScalaExample

SparkOnHbaseJavaExample

Java/Scala/Python program in the Spark on HBase scenario

The program uses HBase as data sources. In this project, data is stored in HBase in Avro format. Data is read from HBase, and the read data is filtered.

SparkOnHbasePythonExample

SparkOnHbaseScalaExample

SparkOnHudiJavaExample

Java/Scala/Python program in the Spark on Hudi scenario

The program uses Spark jobs to perform operations such as insertion, query, update, incremental query, query at a specific time, and data deletion on Hudi.

SparkOnHudiPythonExample

SparkOnHudiScalaExample

SparkOnMultiHbaseScalaExample

Scala program that uses Spark to access HBase in two clusters at the same time

This program is only available for clusters with Kerberos authentication enabled.

SparkSQLJavaExample

Java/Python/Scala program of Spark SQL tasks

The program reads text data from HDFS and then calculates and analyzes the data.

SparkSQLPythonExample

SparkSQLScalaExample

SparkStreamingKafka010JavaExample

Java/Scala program used by Spark Streaming to receive data from Kafka and perform statistical analysis

The program accumulates and calculates the stream data in Kafka in real time and calculates the total number of records of each word.

SparkStreamingKafka010ScalaExample

SparkStreamingtoHbaseJavaExample010

Java/Scala/Python sample project used by Spark Streaming to read Kafka data and write the data into HBase

The program starts a task every 5 seconds to read data from Kafka and updates the data to a specified HBase table.

SparkStreamingtoHbasePythonExample010

SparkStreamingtoHbaseScalaExample010

SparkStructuredStreamingJavaExample

The program uses Structured Streaming in Spark jobs to call Kafka APIs to obtain word records. Word records are classified to obtain the number of records of each word.

SparkStructuredStreamingPythonExample

SparkStructuredStreamingScalaExample

SparkThriftServerJavaExample

Java/Scala program for Spark SQL access through JDBC.

In this sample, a custom JDBCServer client and JDBC connections are used to create, load data to, query, and delete tables.

SparkThriftServerScalaExample

StructuredStreamingADScalaExample

Structured Streaming is used to read advertisement request data, display data, and click data from Kafka, obtain effective display statistics and click statistics in real time, and write the statistics to Kafka.

StructuredStreamingStateScalaExample

This Spark structured streaming program collects statistics on the number of events in each session and the start and end timestamp of the sessions in different batches, and outputs the sessions that the state is updated in this batch.

SpringBoot (This component is available only in MRS 3.3.0 or later.)

clickhouse-examples

clickhouse-rest-client-example

An application development example for connecting SpringBoot to ClickHouse.

This program establishes server connections, creates databases and data tables, inserts data, queries data, and deletes data tables

doris-examples

doris-rest-client-example

SpringBoot development example for Doris data read and write

This example shows you how to connect SpringBoot to Doris.

flink-examples

flink-dws-read-example

Application development example for connecting GaussDB(DWS) to Flink using SpringBoot.

flink-dws-sink-example

hbase-examples

Application development example for connecting SpringBoot to Phoenix.

This example shows you how to connect SpringBoot to HBase and Phoenix.

hive-examples

hive-rest-client-example

Application development example for connecting SpringBoot to Hive.

This example uses SpringBoot to connect Hive to create tables, load data, query data, and delete tables in Hive.

kafka-examples

Application development example for connecting SpringBoot to Kafka for topic production and consumption.