Compiling and Running the Application

Scenario

After the program codes are developed, you can upload the codes to the Linux client for running. The running procedures of applications developed in Scala or Java are the same.

The Spark application developed in Python does not need to build Artifacts as a jar. You just need to copy the sample projects to the compiler.
It is needed to ensure that the version of Python installed on the worker and driver is consistent, otherwise the following error will be reported: "Python in worker has different version %s than that in driver %s."
Ensure that Maven image repository of the SDK in the Huawei image site has been configured in Maven. For details, see Methods for Obtaining Sample Projects.

Procedure

In the IntelliJ IDEA, open the Maven tool window.

On the main page of the IDEA, choose View > Tool Windows > Maven to open the Maven tool window.
Figure 1 Openning the Maven tool window

If the project is not imported using Maven, perform the following operations:

Right-click the pom file in the sample code project and choose Add as Maven Project from the shortcut menu to add a Maven project.

Figure 2 Adding a Maven project
Use Maven to generate a JAR file.
1. In the Maven tool window, select clean from Lifecycle to execute the Maven building process.
  Figure 3 Selecting clean from Lifecycle and execute the Maven building process
2. In the Maven tool window, select package from Lifecycle and execute the Maven building process.
  Figure 4 Selecting package from Lifecycle and execute the Maven build process.
  
  If the following information is displayed in Run:, the packaging is successful.
  Figure 5 Packaging success message
3. You can obtain the JAR package from the target folder in the project directory.
  Figure 6 Obtaining the JAR Package
Copy the JAR file generated in 2 (for example, CollectFemaleInfo.jar) to the Spark operating environment (that is, the Spark client), for example, /opt/female. Run the Spark application. For details about the example application, see Developing the Project.

Do not restart the HDFS service or all DataNode instances during Spark job running. Otherwise, the job may fail and some JobHistory data may be lost.