Updated on 2023-08-31 GMT+08:00

Development Process

Development Process of a Spark Application

Spark includes Spark Core, Spark SQL and Spark Streaming, whose development processes are the same.

Figure 1 and Table 1 describe the development process.

Figure 1 Spark development process
Table 1 Description of Spark development process

Stage

Description

Reference

Preparing the development environment

The Spark application is developed in Scala, Java, and Python. The IDEA tool is recommended to prepare development environments in different languages based on the reference. The running environment of Spark is the Spark client. Install and configure the client based on the reference.

Preparing the Development Environment

Preparing the configuration files for connecting to the cluster

During the development or a test run of the project, you need to use the cluster configuration files to connect to an MRS cluster. The configuration files usually contain the cluster component information file and user files used for security authentication. You can obtain the required information from the created MRS cluster.

Preparing the Configuration Files for Connecting to the Cluster

Configuring and importing sample projects

provides a range of sample projects for different scenarios. You can obtain a sample project and import it to the local development environment or create a Spark project according to the guide.

Configuring and Importing Sample Projects

Creating a New Project (Optional)

Configuring security authentication

If you are using an MRS cluster with Kerberos authentication enabled, security authentication is required.

Preparing for Security Authentication

Writing program code for a service scenario

Sample projects in different languages including Scala, Java, and Python are provided. Sample projects in different scenarios including Streaming, SQL, JDBC client program, and Spark on HBase are also provided.

This helps users to better understand the programming interfaces of all Spark components quickly.

Developing the Project

Compiling and running the project

Compile and run the project. You can debug and run the project in the local Windows development environment, or compile the project into a JAR package and submit it to a Linux node.

NOTE:

You can optimize the project based on its running status to meet the performance requirement in the current service scenario. After the optimization, compile and run the project again. For details, see Spark2x Performance Tuning in .

Compiling and Running the Application