Updated on 2024-10-23 GMT+08:00

Writing and Running the Spark Program in the Local Windows Environment

Scenario

You can run applications in the Windows environment after application code development is complete. The procedures for running applications developed using Scala or Java are the same on IDEA.

  • In the Windows environment, only the sample code for accessing Spark SQL using JDBC is provided.
  • Ensure that the Maven image repository of the SDK in the Huawei image site has been configured for Maven. For details, see Configuring Huawei Open-Source Mirrors.

Procedure

  1. Obtain the sample code.

    Download the Maven project source code and configuration file of the sample project. For details, see Obtaining Sample Projects.

    Import the sample code to IDEA.

  2. Obtain configuration files.

    1. Obtain the files from the cluster client. Download the hive-site.xml and spark-defaults.conf files from $SPARK_HOME/conf to a local directory.
    2. On the FusionInsight Manager page of the cluster, download the user authentication file to a local directory.

  3. Upload data to HDFS.

    1. Create a data text file on Linux and save the following data to the data file:
      Miranda,32 
      Karlie,23 
      Candice,27
    2. On the HDFS client, run the following commands for authentication:

      cd {Client installation directory}

      kinit <Service user for authentication>

    3. On the HDFS client running the Linux OS, run the hadoop fs -mkdir /data command (or the hdfs dfs command) to create a directory.
    4. On the HDFS client running the Linux OS, run the hadoop fs -put data /data command to upload the data file.

  4. Configure related parameters in the sample code.

    1. Configure the authentication information.

      Set userPrincipal to the username.

      Set userKeytabPath to the path of the downloaded keytab file.

      Set Krb5ConfPath to the path of the downloaded krb5.conf file.

      Set the domain name to DEFAULT_REALM. In the KerberosUtil class, change DEFAULT_REALM to the domain name of the cluster.

    2. Change user.principal and user.keytab in the string concatenated by securityConfig to the corresponding username and path. Note that the path of the keytab file must use slashes (/).

    3. Change the SQL statement for loading data to LOAD DATA INPATH 'hdfs:/data/data' INTO TABLE CHILD.

  5. Add running parameters to the hive-site.xml and spark-defaults.conf files when the application is running.

  6. Run the application.