Updated on 2023-09-14 GMT+08:00

Sample Projects of MRS Components

You need to obtain the sample projects from Method of Building an MRS Sample Project, switch the branch to the version that matches the MRS cluster, download the package to a local directory, and decompress the package to obtain the sample code project of each component.

The MRS sample code library provides sample projects of basic functions of each component. Table 1 lists the projects of the current version.

Table 1 Sample projects of each component (2.x)

Component

Sample Project Location

Description

Alluxio

alluxio-examples

Use Alluxio to connect the storage system sample program through a public interface. The example writes and reads files.

Flink

flink-examples

The following sample programs are provided:

  • DataStream program

    Java/Scale program for Flink to construct DataStream. This project analyzes user log data based on service requirements, reads text data, generates DataStreams, filters data that meets specified conditions, and obtains results.

  • Program that produces and consumes data in Kafka

    Java/Scala program that uses a Flink job to produce and consume data from Kafka. In this project, assume that a Flink service receives one message per second. The Producer application sends data to Kafka, the Consumer application receives data from Kafka, and the program processes and prints the data.

  • Asynchronous checkpointing

    Java/Scala program for Flink asynchronous checkpointing. In this project, the program uses custom operator to continuously generate data. The generated data is a quadruple of long, string, string, and integer values. The program collects statistic results and displays them on the terminal. A checkpoint is triggered every other 6 seconds and the checkpoint result is stored in HDFS.

  • Stream SQL join

    Flink streaming SQL join program. This program calls APIs of the flink-connector-kafka module to produce and consume data. It generates Table1 and Table2, uses Flink SQL to perform joint query on the tables, and displays results.

HBase

hbase-examples

HBase data read and write

This program calls HBase APIs to create user tables, import user data, add and query user information, and create secondary indexes for user tables.

HDFS

hdfs-examples

Java program for HDFS file operations.

This program creates HDFS folders, writs files, appends file content, reads files, and deletes files or folders.

Hive

hive-examples

The following JDBC/HCatalog sample programs are provided:

  • Java program for Hive JDBC to process data

    In this project, JDBC APIs are used to connect Hive and perform data operations. JDBC APIs are called to create tables, load data, and query data.

  • Java program for Hive HCatalog to process data

    HCatalog APIs are used to define and query MRS Hive metadata with Hive CLI.

Impala

impala-examples

Java program for Impala JDBC to process data

In this project, the JDBC APIs are called to connect Impala and perform data operations in Impala. JDBC APIs are called to create tables, load data, and query data.

Kafka

kafka-examples

Java program for processing Kafka streaming data

The program is developed based on Kafka Streams to count words in each message by reading messages in the input topic and to output the result in key-value pairs by consuming data in the output topic.

MapReduce

mapreduce-examples

Java program for submitting MapReduce jobs

This program runs a MapReduce statistics data job to analyze and process data and output data required by users.

It illustrates how to write MapReduce jobs to access multiple service components in HDFS, HBase, and Hive, helping you to develop for key operations such as authentication and configuration loading.

Presto

presto-examples

The following JDBC/HCatalog sample programs are provided:

  • Java program for Presto JDBC to process data

    In this project, the JDBC APIs are called to connect Presto and perform data operations in Presto. JDBC APIs are called to create tables, load data, and query data.

  • Java program for Presto HCatalog to process data

OpenTSDB

opentsdb-examples

OpenTSDB APIs are called to collect monitoring information in a large-scale cluster and query data in seconds. This program can write, query, and delete data.

Spark

spark-examples

SparkHbasetoHbaseJavaExample

Java/Scala program that uses Spark to read data from and then write data to HBase

The program uses Spark jobs to analyze and summarize data of two HBase tables.

SparkHbasetoHbaseScalaExample

SparkHivetoHbaseJavaExample

Java/Scala program that uses Spark to read data from Hive and then write data to HBase

The program uses Spark jobs to analyze and summarize data of a Hive table and write result to an HBase table.

SparkHivetoHbaseScalaExample

SparkJavaExample

Java/Python/Scala program of Spark Core tasks

The program reads text data from HDFS and then calculates and analyzes the data.

SparkPythonExample

SparkScalaExample

SparkLauncherJavaExample

Java/Scala program that uses Spark Launcher to submit jobs

The program uses the org.apache.spark.launcher.SparkLauncher class through Java/Scala commands to submit Spark jobs.

SparkLauncherScalaExample

SparkOnHbaseJavaExample

Java/Scala program in the Spark on HBase scenario

The program uses HBase as data sources. In this project, data is stored in HBase in Avro format. Data is read from the HBase, and the read data is filtered.

SparkOnHbaseScalaExample

SparkSQLJavaExample

Java/Scala program of Spark SQL tasks

The program reads text data from HDFS and then calculates and analyzes the data.

SparkSQLScalaExample

SparkStreamingJavaExample

Java/Scala program used by Spark Streaming to receive data from Kafka and perform statistical analysis

This program analyzes user log data based on service requirements, reads text data, generates DataStreams, filters data that meets specified conditions, and obtains results.

SparkStreamingScalaExample

SparkStreamingKafka010JavaExample

Java/Scala program used by Spark Streaming to receive data from Kafka and perform statistical analysis

The program accumulates and calculates the stream data in Kafka in real time and calculates the total number of records of each word.

SparkStreamingKafka010ScalaExample

SparkStreamingtoHbaseJavaExample

Java/Scala sample project used by Spark Streaming to read Kafka data and write the data into HBase

The program starts a task every 5 seconds to read data from Kafka and updates the data to a specified HBase table.

SparkStreamingtoHbaseScalaExample

SparkStructuredStreamingJavaExample

The program uses Structured Streaming in Spark jobs to call Kafka APIs to obtain word records. Word records are classified to obtain the number of records of each word.

SparkStructuredStreamingScalaExample

SparkThriftServerJavaExample

Java/Scala program for Spark SQL access through JDBC.

In this sample, a custom JDBCServer client and JDBC connections are used to create, load data to, query, and delete tables.

SparkThriftServerScalaExample

Storm

storm-examples

storm-common-examples

Constructor of Storm topologies and Spout/Bolt The program can create Spout, Bolt, and Topology.

storm-hbase-examples

Interaction between Storm and HBase of MRS The program submits the Storm topology and stores the data to the WordCount table of HBase.

storm-hdfs-examples

Interaction between Storm and HDFS of MRS The program submits the Storm topology and stores the data to HDFS.

storm-jdbc-examples

Accessing MRS Storm with JDBC The program uses Storm topology to insert data into a table.

storm-kafka-examples

Interaction between Storm and Kafka of MRS The program uses the Storm topology to send data to Kafka and display the data.

storm-obs-examples

Interaction between Storm and OBS of MRS The program submits the Storm topology and stores the data to OBS.