Updated on 2023-08-31 GMT+08:00

Spark2x Sample Project

To obtain an MRS sample project, visit https://github.com/huaweicloud/huaweicloud-mrs-example and switch to the branch that matches the MRS cluster version. Download the package to the local PC and decompress it to obtain the sample project of each component.

MRS provides the following Spark2x sample projects:
Table 1 Spark2x-related sample projects

Sample Project Location

Description

sparknormal-examples/SparkHbasetoCarbonJavaExample

Application development sample code for Spark to synchronize HBase data to CarbonData.

In this sample project, the application writes data to HBase in real time for point query services. Data is synchronized to CarbonData tables in batches at a specified interval for analytical query services.

sparknormal-examples/SparkHbasetoHbaseJavaExample

Spark reads data from HBase and then writes the data to the Java/Scala/Python sample project of HBase.

In this sample project, the Spark applications analyze and summarize data in two HBase tables.

sparknormal-examples/SparkHbasetoHbasePythonExample

sparknormal-examples/SparkHbasetoHbaseScalaExample

sparknormal-examples/SparkHivetoHbaseJavaExample

Application development sample code for Spark to read data from Hive and write the data to HBase.

sparknormal-examples/SparkHivetoHbasePythonExample

sparknormal-examples/SparkHivetoHbaseScalaExample

sparknormal-examples/SparkJavaExample

Java/Python/Scala sample project of Spark Core tasks.

The applications of this project read text data from HDFS and then calculate and analyze the data.

sparknormal-examples/SparkPythonExample

sparknormal-examples/SparkSQLJavaExample

sparknormal-examples/SparkLauncherJavaExample

Java/Scala sample project that uses Spark Launcher to submit jobs.

This project uses the org.apache.spark.launcher.SparkLauncher class through Java or Scala commands to submit Spark applications.

sparknormal-examples/SparkLauncherScalaExample

sparknormal-examples/SparkOnClickHouseJavaExample

Spark uses the native ClickHouse JDBC APIs and Spark JDBC driver to create, query, and insert ClickHouse databases and tables.

sparknormal-examples/SparkOnClickHousePythonExample

sparknormal-examples/SparkOnClickHouseScalaExample

sparknormal-examples/SparkOnHbaseJavaExample

Java/Scala/Python sample project in the Spark on HBase scenario.

You can use HBase as data sources in applications. In this project, data is stored in HBase in Avro format. Data is read from the HBase, and the read data is filtered.

sparknormal-examples/SparkOnHbasePythonExample

sparknormal-examples/SparkOnHbaseScalaExample

sparknormal-examples/SparkOnHudiJavaExample

Java/Scala/Python sample project in the Spark on Hudi scenario.

The applications of this project use Spark to perform operations such as data insertion, query, update, incremental query, query at a specific time point, and data deletion on Hudi.

sparknormal-examples/SparkOnHudiPythonExample

sparknormal-examples/SparkOnHudiScalaExample

sparknormal-examples/SparkSQLJavaExample

Java/Python/Scala sample project of Spark SQL tasks.

The applications of this project read text data from HDFS and then calculate and analyze the data.

sparknormal-examples/SparkSQLPythonExample

sparknormal-examples/SparkSQLScalaExample

sparknormal-examples/SparkStreamingKafka010JavaExample

Java/Scala sample project used by Spark Streaming to receive data from Kafka and perform statistical analysis.

The applications of this project accumulate and calculate the stream data in Kafka in real time and calculate the total number of records of each word.

sparknormal-examples/SparkStreamingKafka010PythonExample

sparknormal-examples/SparkStreamingtoHbaseJavaExample010

Java/Scala/Python sample project used by Spark Streaming to read Kafka data and write the data into HBase.

The applications of this project start a task every 5 seconds to read data from Kafka and update the data to a specified HBase table.

sparknormal-examples/SparkStreamingtoHbasePythonExample010

sparknormal-examples/SparkStreamingtoHbaseScalaExample010

sparknormal-examples/SparkStructuredStreamingJavaExample

In Spark applications, Structured Streaming is used to call Kafka APIs to obtain word records. Word records are classified to obtain the number of records of each word.

sparknormal-examples/SparkStructuredStreamingPythonExample

sparknormal-examples/SparkStructuredStreamingScalaExample

sparknormal-examples/SparkThriftServerJavaExample

Java/Scala sample project for Spark SQL access through JDBC.

In this sample, you can customize JDBCServer clients and use JDBC connections to create, load data to, query, and delete data tables.

sparknormal-examples/SparkThriftServerScalaExample

sparknormal-examples/StructuredStreamingADScalaExample

Structured Streaming is used to read advertisement request data, display data, and click data from Kafka, obtain effective display statistics and click statistics in real time, and write the statistics to Kafka.

sparknormal-examples/StructuredStreamingStateScalaExample

In the Spark structure flow application, the number of events in each session and the start and end timestamp of the sessions are collected in different batches. At the same time, the system exports the sessions that are in the updated state in this batch.