Introduction to Spark Application Development
Spark
Spark is a distributed batch processing framework. It provides analysis and mining and iterative memory computing capabilities and supports application development in multiple programming languages, including Scala, Java, and Python. Spark applies to the following scenarios:
- Data processing: Spark can process data quickly and has fault tolerance and scalability.
- Iterative computation: Spark supports iterative computation to meet the requirements of multi-step data processing logic.
- Data mining: Based on massive data, Spark can handle complex data mining and analysis and support multiple data mining and machine learning algorithms.
- Streaming processing: Spark supports streaming processing with delay in seconds and supports multiple external data sources.
- Query analysis: Spark supports standard SQL query analysis, provides the DSL (DataFrame), and supports multiple external inputs.
- Figure 1 shows the component architecture of Apache Spark. This section provides guidance to application development of Spark, Spark SQL and Spark Streaming. For details about MLlib and GraghX, visit the Spark official website at http://spark.apache.org/docs/2.2.2/.
Spark APIs
Spark supports application development in multiple programming languages, including Scala, Java, and Python. Since Spark is developed in Scala and Scala is easy to read, you are advised to develop Spark applications in Scala.
Table 1 describes Spark APIs in different languages.
API |
Description |
---|---|
Scala API |
Indicates the API in Scala. Since Scala is easy to read, you are advised to use Scala APIs to develop applications. |
Java API |
Indicates the API in Java. |
Python API |
Indicates the API in Python. |
Divided by different modes, Spark Core and Spark Streaming use APIs listed in the preceding table to develop applications. Spark SQL can be accessed through CLI and ThriftServer. There are two ways to access the ThriftServer: Beeline and the JDBC client code.
For the spark-sql, spark-shell, and spark-submit scripts (running applications contain SQL operations), do not use the proxy user parameter to submit a task.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot