Updated on 2024-08-10 GMT+08:00

Spark Application Development Process

Development Process

Spark includes Spark Core, Spark SQL and Spark Streaming, whose development processes are the same.

Figure 1 and Table 1 describe the development process.

Figure 1 Spark development process
Table 1 Spark application development process

Stage

Description

Reference

Understand the basic concepts.

Before developing an application, it is important to have a grasp of the basic concepts of Spark. The specific concepts to focus on will depend on the scenario at hand, but generally include Spark Core, Spark SQL, and Spark Streaming.

Basic Concepts

Prepare the development and operating environment.

Spark applications can be developed in Scala, Java, and Python. You are advised to use IntelliJ IDEA to configure development environments in different languages according to the guide. The running environment of Spark is the Spark client. Install and configure the client based on the reference.

Preparing a Local Application Development Environment

Prepare a developer account.

The development account is used to run the sample project. To run Spark sample projects, you must have permissions for HDFS, YARN, Kafka, and Hive.

Preparing MRS Application Development User

Create a project.

Spark offers sample projects for various scenarios, which can be imported for study purposes. Or you can create a Spark project based on the reference.

Importing and Configuring Spark Sample Projects

(Optional) Creating Spark Sample Projects

Prepare for security authentication.

If a security cluster is used, the authentication is required.

Configuring Security Authentication for Spark Applications

Write program code for a service scenario.

Spark provides sample projects in Scala, Java, and Python, covering various scenarios such as Streaming, SQL, JDBC client programs, and Spark on HBase.

These samples are designed to help users quickly learn about the programming interfaces of all Spark components.

Developing Spark Applications

Compile and run the application.

You can compile the developed application and deliver it for running based on the reference.

Commissioning a Spark Application

View application running results.

Application running results are stored in the specified directory. You can also check the running results through the UI.

Tune the application.

You can optimize the application based on its running status to meet requirements of the service scenario.

After application tuning, compile and run the application again.

Spark2x Performance Optimization