Commissioning a Spark Application in a Linux Environment
After application code development is complete, you can upload it to the Linux client to run applications. The procedures for running applications developed using Scala or Java are the same on the Spark client.
- A Spark application developed using Python does not need to be packed into a JAR file. You only need to copy the sample project to the compiler.
- It is needed to ensure that the version of Python installed on the worker and driver is consistent, otherwise the following error will be reported: "Python in worker has different version %s than that in driver %s."
- Ensure that the Maven image repository of the SDK in the Huawei image site has been configured for Maven. For details, see Configuring Huawei Open Source Image Repository.
Compiling and Running the Application
- In the IntelliJ IDEA, open the Maven tool window.
On the main page of the IDEA, choose View > Tool Windows > Maven to open the Maven tool window.Figure 1 Opening the Maven tool window
If the project is not imported using Maven, perform the following operations:
Right-click the pom file in the sample code project and choose Add as Maven Project from the shortcut menu to add a Maven project.
Figure 2 Adding a Maven project
- Use Maven to generate a JAR file.
- In the Maven window, select clean from Lifecycle to execute the Maven building process.
Figure 3 Selecting clean from Lifecycle and executing the Maven building process
- In the Maven tool window, select package from Lifecycle and execute the Maven building process.
Figure 4 Selecting package from Lifecycle and executing the Maven build process
If the following information is displayed in Run:, the packaging is successful.Figure 5 Packaging success message
- Obtain the JAR package from the target folder in the project directory.
Figure 6 Obtaining the JAR package
- In the Maven window, select clean from Lifecycle to execute the Maven building process.
- Copy the JAR file generated in 2 (for example, CollectFemaleInfo.jar) to the Spark running environment (that is, the Spark client), for example, /opt/female. Run the Spark application. For details about the sample project, see Developing Spark Applications.
Do not restart the HDFS service or all DataNode instances during Spark job running. Otherwise, the job may fail and some JobHistory data may be lost.
Viewing Commissioning Results
After a Spark application is run, you can check the running result through one of the following methods:
- View the command output.
The data storage directory and format are specified by users in the Spark application. You can obtain the data in the specified file.
- Log in to the Spark web UI.
- Spark UI: used to display the status of running applications.
Spark UI consists of five parts: Jobs, Stages, Storage, Environment, and Executors. Besides these parts, Streaming is displayed for the Streaming application.
Access to the web UI: On the YARN web UI, find the corresponding Spark application and click ApplicationMaster in the last column of the application information. The Spark web UI is displayed.
- The History Server UI displays the status of all Spark applications.
The UI includes the application ID, application name, start time, end time, execution time, and owner information. You can switch to the Spark UI of an application by clicking the ID of the application.
- Spark UI: used to display the status of running applications.
- View Spark logs.
The logs of Spark offer immediate visibility into application running conditions. You can adjust application programs based on the logs. Log related information can be referenced to Spark2x Logs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot