Updated on 2024-08-10 GMT+08:00

Commissioning a Spark Application in a Linux Environment

After application code development is complete, you can upload it to the Linux client to run applications. Applications developed in Scala or Java have the same running procedures.

  • A Spark application developed using Python does not need to be packed into a JAR file. You only need to copy the sample project to the compiler.
  • To avoid the error message "Python in worker has different version %s than that in driver %s," make sure that the Python version installed on both the worker and driver is the same.
  • Ensure that the Maven image repository of the SDK in the Huawei image site has been configured for Maven. For details, see Configuring Huawei Open Source Image Repository.

Compiling and Running Applications

  1. In IntelliJ IDEA, open the Maven tool window.

    On the main page of the IDEA, choose View > Tool Windows > Maven to open the Maven tool window.
    Figure 1 Opening the Maven tool window

    If the project is not imported using Maven, perform the following operations:

    Right-click the pom file in the sample code project and choose Add as Maven Project from the shortcut menu to add a Maven project.

    Figure 2 Adding a Maven project

  2. Use Maven to generate a JAR file.

    1. In the Maven window, select clean from Lifecycle to execute the Maven building process.
      Figure 3 Selecting clean from Lifecycle and executing the Maven building process
    2. In the Maven tool window, select package from Lifecycle and execute the Maven building process.
      Figure 4 Selecting package from Lifecycle and executing the Maven build process
      If the following information is displayed in Run:, the packaging is successful.
      Figure 5 Packaging success message
    3. Obtain the JAR package from the target folder in the project directory.
      Figure 6 Obtaining the JAR package

  3. Copy the JAR file generated in 2 (for example, CollectFemaleInfo.jar) to the Spark running environment (that is, the Spark client), for example, /opt/female. Run the Spark application. For details about the sample project, see Developing Spark Applications.

    Do not restart the HDFS service or all DataNode instances during Spark job running. Otherwise, the job may fail and some JobHistory data may be lost.

Viewing Commissioning Results

Once a Spark application has been executed, you can review the results using any of the following methods:

  • View the command output.

    The data storage directory and format are specified by users in the Spark application. You can obtain the data in the specified file.

  • Log in to the Spark web UI.

    The Spark contains the following two web UIs:

    • Spark UI: used to display the status of running applications.

      Spark UI consists of five parts: Jobs, Stages, Storage, Environment, and Executors. Besides these parts, Streaming is displayed for the Streaming application.

      Access to the web UI: On the YARN web UI, find the corresponding Spark application and click ApplicationMaster in the last column of the application information. The Spark web UI is displayed.

    • The History Server UI displays the status of all Spark applications.

      The UI includes the application ID, application name, start time, end time, execution time, and owner information. You can switch to the Spark UI of an application by clicking the ID of the application.

  • View Spark logs.

    The logs of Spark offer immediate visibility into application running conditions. You can adjust application programs based on the logs. Log related information can be referenced to Spark2x Logs.