Application Development Overview
Spark Introduction
Spark is a distributed batch processing system as well as an analysis and mining engine. It provides an iterative memory computation framework and supports the development in multiple programming languages, including Scala, Java, and Python. The application scenarios of Spark include:
- Data processing: Spark can process data quickly and has fault tolerance and scalability.
- Iterative computation: Spark supports iterative computation to meet the requirements of multi-step data processing logic.
- Data mining: Based on massive data, Spark can handle complex data mining and analysis and support multiple data mining and machine learning algorithms.
- Streaming Processing: Spark supports stream processing at a seconds-level delay and supports multiple external data sources.
- Query Analysis: Sparks supports standard SQL query analysis, provides the DSL (DataFrame), and supports multiple external inputs.
This section focuses on the application development guides of Spark, Spark SQL and Spark Streaming.
Spark Development Interface Introduction
Spark supports the development in multiple programming languages, including Scala, Java, and Python. Since Spark is developed in Scala and Scala is easy to read, users are advised to develop Spark application in Scala.
Divided by different languages, the APIs of Spark are listed in Table 1.
Function |
Description |
---|---|
Scala API |
Indicates the API in Scala. For common interfaces of Spark Core, Spark SQL and Spark Streaming, see Scala. Since Scala is easy to read, users are advised to use Scala interfaces in the program development. |
Java API |
Indicates the API in Java. For common interfaces of Spark Core, Spark SQL and Spark Streaming, see Java. |
Python API |
Indicates the API in Python. For common interfaces of Spark Core, Spark SQL and Spark Streaming, see Python. |
Divided by different modes, APIs listed in the preceding table are used in the development of Spark Core and Spark Streaming. Spark SQL supports CLI and JDBCServer for accessing. There are two ways to access the JDBCServer: Beeline and the JDBC client code. For details, see JDBCServer Interface.
For spark-sql, spark-shell and spark-submit (which application contains SQL operations), do not use the proxy user parameter to submit a task. This is partly because the spark-sql script with the proxy user parameter does not support task submission and partly because the sample program mentioned in this document already contains security authentication.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.