Importing and Configuring Spark Sample Projects

Scenario

Spark provides sample projects for multiple scenarios, including Java projects and Scala projects. This helps users to learn Spark projects quickly.

Import methods of Java and Scala projects are the same. Sample projects developed by using the Python do not need to be imported, and you only need to open the Python file (*.py).

The import of Java sample codes is used as a sample in the following procedure. Figure 1 shows the procedure if importing sample projects.

Figure 1 Procedure of importing sample projects

Prerequisites

Ensure that the difference between the local environment time and the cluster time is less than 5 minutes. If the time difference cannot be determined, contact the system administrator. You can view the time of the cluster in the lower-right corner on the FusionInsight Manager page.
You have prepared the development environment and MRS cluster configuration files. For details, see Preparing the Configuration File for Connecting Spark to the Cluster.

Procedure

Obtain multiple sample projects such as Scala and Spark Streaming in the sparksecurity-examples folder in the spark-examples directory where the sample code is decompressed. For details, see Obtaining the MRS Application Development Sample Project.
Obtain the configuration and authentication files required by each sample project by referring to Preparing the Configuration File for Connecting Spark to the Cluster and import the configuration files to the configuration file directory of the Spark sample project if you need to commission the Spark sample code in the local Windows environment.
After the IntelliJ IDEA and JDK are installed, configure the JDK in IntelliJ IDEA.
1. Start the IntelliJ IDEA and select Configure.
  Figure 2 Quick Start
2. Select Project Defaults from the Configure drop-down list.
  Figure 3 Configure
3. Select Project Structure from Project Defaults.
  Figure 4 Project Defaults
4. On the displayed Project Structure page, select SDKs and click the plus sign to add the JDK.
  Figure 5 Adding the JDK
5. In the displayed Select Home Directory for JDK window, select a home directory for JDK and click OK.
  Figure 6 Selecting a home directory for the JDK
6. After selecting the JDK, click OK to complete the configuration.
  Figure 7 Completing the configuration
(Optional) If the Scala development environment is used, install the Scala plug-in in IntelliJ IDEA.
1. Select Plugins from the Configure drop-down list.
  Figure 8 Plugins
2. On the Plugins page, select Install plugin from disk.
  Figure 9 Install plugin from disk
3. On the Choose Plugin File page, select the Scala plugin file of the corresponding version and click OK.
4. On the Plugins page, click Apply to install the Scala plugin.
5. On the displayed Plugins Changed page, click Restart for the configuration to take effect.
  Figure 10 Plugins Changed
Import the Java sample projects to the IDEA.
1. Start the IntelliJ IDEA. On the Quick Start page, select Import Project.
  Or, for the used IDEA tool, add projects directly from the IDEA homepage. Select File > Import project... to import projects.
  Figure 11 Import Project (on the Quick Start page)
2. Select the directory to store the imported projects and the pom file, and click OK.
  Figure 12 Select File or Directory to Import
3. Confirm the import directory and project name, and click Next.
  Figure 13 Import Project from Maven
4. Select the projects to import and click Next.
5. Confirm the project JDK and click Next.
  Figure 14 Select project SDK
6. Confirm the project name and project file location, and click Finish to complete the import.
  Figure 15 Confirm the project name and file location
7. After the import, the imported projects are displayed on the IDEA homepage.
  Figure 16 Imported projects
(Optional) If a sample project developed in Scala is imported, configure the language for the project.
1. On the main page of the IDEA, choose File > Project Structures...to access the Project Structure page.
2. Choose Modules, right-click the project name, and choose Add > Scala.
  Figure 17 Selecting the Scala language
3. Wait until IDEA identifies Scala SDK, select the dependency JAR packages in the Add Scala Support dialog box, and then click OK
  Figure 18 Add Scala Support
4. If IDEA fails to identify Scala SDK, you are required to create a Scala SDK.
  1. Click Create...
    Figure 19 Create...
  2. On the Select JAR';s for the new Scala SDK page, click Browse...
    Figure 20 Select JAR's for the new Scala SDK
  3. On the Scala SDK files page, select the scala sdk directory, and then click OK.
    Figure 21 Scala SDK files
5. Click OK.
  Figure 22 Successful configuration
Set the file encoding of IDEA and solve the display of garble characters.
1. On the IDEA homepage, choose File > Settings....
  Figure 23 Choosing Settings
2. Configure the encoding.
  1. On the Settings page, choose Editor > File Encodings.
  2. In the Global Encoding and Project Encoding drop-down lists, select UTF-8, respectively.
  3. Click Apply.
  4. Click OK to complete the encoding configuration.

Sample Code Path Description

**Table 1** Sample code path description
Sample Code Project	Sample Name	Sample Development Language
SparkJavaExample	Spark Core Project	Java
SparkScalaExample	Spark Core Project	Scala
SparkPythonExample	Spark Core Project	Python
SparkSQLJavaExample	Spark SQL Project	Java
SparkSQLScalaExample	Spark SQL Project	Scala
SparkSQLPythonExample	Spark SQL Project	Python
SparkThriftServerJavaExample	Accessing the Spark SQL Through JDBC	Java
SparkThriftServerScalaExample	Accessing the Spark SQL Through JDBC	Scala
SparkOnHbaseJavaExample-AvroSource	Spark on HBase-Performing Operations on Data in Avro Format	Java
SparkOnHbaseScalaExample-AvroSource	Spark on HBase-Performing Operations on Data in Avro Format	Scala
SparkOnHbasePythonExample-AvroSource	Spark on HBase-Performing Operations on Data in Avro Format	Python
SparkOnHbaseJavaExample-HbaseSource	Spark on HBase-Performing Operations on the HBase Data Source	Java
SparkOnHbaseScalaExample-HbaseSource	Spark on HBase-Performing Operations on the HBase Data Source	Scala
SparkOnHbasePythonExample-HbaseSource	Spark on HBase-Performing Operations on the HBase Data Source	Python
SparkOnHbaseJavaExample-JavaHBaseBulkPutExample	Spark on HBase-Using the BulkPut Interface	Java
SparkOnHbaseScalaExample-HBaseBulkPutExample	Spark on HBase-Using the BulkPut Interface	Scala
SparkOnHbasePythonExample-HBaseBulkPutExample	Spark on HBase-Using the BulkPut Interface	Python
SparkOnHbaseJavaExample-JavaHBaseBulkGetExample	Spark on HBase-Using the BulkGet Interface	Java
SparkOnHbaseScalaExample-HBaseBulkGetExample	Spark on HBase-Using the BulkGet Interface	Scala
SparkOnHbasePythonExample-HBaseBulkGetExample	Spark on HBase-Using the BulkGet Interface	Python
SparkOnHbaseJavaExample-JavaHBaseBulkDeleteExample	Spark on HBase-Using the BulkDelete Interface	Java
SparkOnHbaseScalaExample-HBaseBulkDeleteExample	Spark on HBase-Using the BulkDelete Interface	Scala
SparkOnHbasePythonExample-HBaseBulkDeleteExample	Spark on HBase-Using the BulkDelete Interface	Python
SparkOnHbaseJavaExample-JavaHBaseBulkLoadExample	Spark on HBase-Using the BulkLoad Interface	Java
SparkOnHbaseScalaExample-HBaseBulkLoadExample	Spark on HBase-Using the BulkLoad Interface	Scala
SparkOnHbasePythonExample-HBaseBulkLoadExample	Spark on HBase-Using the BulkLoad Interface	Python
SparkOnHbaseJavaExample-JavaHBaseForEachPartitionExample	Spark on HBase-Using the foreachPartition Interface	Java
SparkOnHbaseScalaExample-HBaseForEachPartitionExample	Spark on HBase-Using the foreachPartition Interface	Scala
SparkOnHbasePythonExample-HBaseForEachPartitionExample	Spark on HBase-Using the foreachPartition Interface	Python
SparkOnHbaseJavaExample-JavaHBaseDistributedScanExample	Spark on HBase-Distributedly Scanning HBase Tables	Java
SparkOnHbaseScalaExample-HBaseDistributedScanExample	Spark on HBase-Distributedly Scanning HBase Tables	Scala
SparkOnHbasePythonExample-HBaseDistributedScanExample	Spark on HBase-Distributedly Scanning HBase Tables	Python
SparkOnHbaseJavaExample-JavaHBaseMapPartitionExample	Spark on HBase-Using the mapPartition Interface	Java
SparkOnHbaseScalaExample-HBaseMapPartitionExample	Spark on HBase-Using the mapPartition Interface	Scala
SparkOnHbasePythonExample-HBaseMapPartitionExample	Spark on HBase-Using the mapPartition Interface	Python
SparkOnHbaseJavaExample-JavaHBaseStreamingBulkPutExample	Spark on HBase-Writing Data to HBase Tables In Batches Using SparkStreaming	Java
SparkOnHbaseScalaExample-HBaseStreamingBulkPutExample	Spark on HBase-Writing Data to HBase Tables In Batches Using SparkStreaming	Scala
SparkOnHbasePythonExample-HBaseStreamingBulkPutExample	Spark on HBase-Writing Data to HBase Tables In Batches Using SparkStreaming	Python
SparkHbasetoHbaseJavaExample	Reading Data from HBase and Write It Back to HBase	Java
SparkHbasetoHbaseScalaExample	Reading Data from HBase and Write It Back to HBase	Scala
SparkHbasetoHbasePythonExample	Reading Data from HBase and Write It Back to HBase	Python
SparkHivetoHbaseJavaExample	Reading Data from Hive and Write It to HBase	Java
SparkHivetoHbaseScalaExample	Reading Data from Hive and Write It to HBase	Scala
SparkHivetoHbasePythonExample	Reading Data from Hive and Write It to HBase	Python
SparkStreamingKafka010JavaExample	Streaming Connecting to Kafka0-10	Java
SparkStreamingKafka010ScalaExample	Streaming Connecting to Kafka0-10	Scala
SparkStructuredStreamingJavaExample	Structured Streaming Project	Java
SparkStructuredStreamingScalaExample	Structured Streaming Project	Scala
SparkStructuredStreamingPythonExample	Structured Streaming Project	Python
StructuredStreamingADScalaExample	Structured Streaming Stream-Stream Join	Scala
StructuredStreamingStateScalaExample	Structured Streaming Status Operation	Scala
SparkOnMultiHbaseScalaExample	Concurrent Access from Spark to HBase in Two Clusters	Scala
SparkOnHudiJavaExample	Using Spark to Perform Basic Hudi Operations	Java
SparkOnHudiPythonExample	Using Spark to Perform Basic Hudi Operations	Python
SparkOnHudiScalaExample	Using Spark to Perform Basic Hudi Operations	Scala