Spark Application Development Process
Development Process of a Spark Application
Spark includes Spark Core, Spark SQL and Spark Streaming, whose development processes are the same.
Figure 1 and Table 1 describe the development process.
Stage |
Description |
Reference |
---|---|---|
Preparing the development environment |
The Spark application is developed in Scala, Java, and Python. The IDEA tool is recommended to prepare development environments in different languages based on the reference. The running environment of Spark is the Spark client. Install and configure the client based on the reference. |
|
Preparing the configuration files for connecting to the cluster |
During the development or a test run of the project, you need to use the cluster configuration files to connect to an MRS cluster. The configuration files usually contain the cluster component information file and user files used for security authentication. You can obtain the required information from the created MRS cluster. |
Preparing the Configuration File for Connecting Spark to the Cluster |
Configuring and importing sample projects |
provides a range of sample projects for different scenarios. You can obtain a sample project and import it to the local development environment or create a Spark project according to the guide. |
|
Configuring security authentication |
If you are using an MRS cluster with Kerberos authentication enabled, security authentication is required. |
|
Writing program code for a service scenario |
Sample projects in different languages including Scala, Java, and Python are provided. Sample projects in different scenarios including Streaming, SQL, JDBC client program, and Spark on HBase are also provided. This helps users to better understand the programming interfaces of all Spark components quickly. |
|
Compiling and running the project |
Compile and run the project. You can debug and run the project in the local Windows development environment, or compile the project into a JAR package and submit it to a Linux node.
NOTE:
You can optimize the project based on its running status to meet the performance requirement in the current service scenario. After the optimization, compile and run the project again. For details, see Spark2x Performance Tuning in . |
Writing and Running the Spark Program in the Linux Environment |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot