Updated on 2024-10-23 GMT+08:00

Writing and Running the Spark Program in the Local Windows Environment

Scenario

You can run applications in the Windows environment after application code development is complete. The procedures for running applications developed using Scala or Java are the same on IDEA.

  • In the Windows environment, only the sample code for accessing Spark SQL using JDBC is provided.
  • Ensure that the Maven image repository of the SDK in the Huawei image site has been configured for Maven. For details, see Configuring Huawei Open-Source Mirrors.

Procedure

  1. Obtain the sample code.

    Download the Maven project source code and configuration file of the sample project. For details, see Obtaining Sample Projects.

    Import the sample code to IDEA.

  1. Obtain configuration files.

    Obtain the files from the cluster client. Download the hive-site.xml and spark-defaults.conf files from $SPARK_HOME/conf to a local directory.

  2. Upload data to HDFS.

    1. Create a data text file on Linux and save the following data to the data file:
      Miranda,32 
      Karlie,23 
      Candice,27
    2. On the HDFS client running the Linux OS, run the hadoop fs -mkdir /data command (or the hdfs dfs command) to create a directory.
    3. On the HDFS client running the Linux OS, run the hadoop fs -put data /data command to upload the data file.

  3. Configure related parameters in the sample code.

    Change the SQL statement for loading data to LOAD DATA INPATH 'hdfs:/data/data' INTO TABLE CHILD.

  4. Add running parameters to the hive-site.xml and spark-defaults.conf files when the application is running.

  5. Run the application.