Updated on 2024-10-23 GMT+08:00

Spark Application Development Process

Development Process of a Spark Application

Spark includes Spark Core, Spark SQL and Spark Streaming, whose development processes are the same.

Figure 1 and Table 1 describe the development process.

Figure 1 Spark development process
Table 1 Description of Spark development process

Stage

Description

Reference

Preparing the development environment

The Spark application is developed in Scala, Java, and Python. The IDEA tool is recommended to prepare development environments in different languages based on the reference. The running environment of Spark is the Spark client. Install and configure the client based on the reference.

Preparing a Local Application Development Environment

Preparing the configuration files for connecting to the cluster

During the development or a test run of the project, you need to use the cluster configuration files to connect to an MRS cluster. The configuration files usually contain the cluster component information file and user files used for security authentication. You can obtain the required information from the created MRS cluster.

Preparing the Configuration File for Connecting Spark to the Cluster

Configuring and importing sample projects

provides a range of sample projects for different scenarios. You can obtain a sample project and import it to the local development environment or create a Spark project according to the guide.

Importing and Configuring Spark Sample Projects

(Optional) Creating Spark Sample Projects

Configuring security authentication

If you are using an MRS cluster with Kerberos authentication enabled, security authentication is required.

Configuring Security Authentication for Spark Applications

Writing program code for a service scenario

Sample projects in different languages including Scala, Java, and Python are provided. Sample projects in different scenarios including Streaming, SQL, JDBC client program, and Spark on HBase are also provided.

This helps users to better understand the programming interfaces of all Spark components quickly.

Developing a Spark Application

Compiling and running the project

Compile and run the project. You can debug and run the project in the local Windows development environment, or compile the project into a JAR package and submit it to a Linux node.

NOTE:

You can optimize the project based on its running status to meet the performance requirement in the current service scenario. After the optimization, compile and run the project again. For details, see Spark2x Performance Tuning in .

Writing and Running the Spark Program in the Linux Environment