Updated on 2022-12-02 GMT+08:00

Submitting a Spark2x Job

Scenario

This section describes how to submit an Oozie job of the Spark2x type on Hue.

Procedure

  1. Create a workflow. For details, see Creating a Workflow.
  2. On the workflow editing page, select next to Spark program and drag it to the operation area.
  3. In the Spark window that is displayed, set the value of Files, for example, to hdfs://hacluster/user/admin/examples/apps/spark2x/lib/oozie-examples.jar. Set the value of jar/py name, for example, to org.apache.oozie.example.SparkFileCopy, and click Add.
  4. Set the value of Main class, for example, org.apache.oozie.example.SparkFileCopy.
  5. Click PARAMETER+ to add related input and output parameters.

    For example, add the following parameters:

    • hdfs://hacluster/user/admin/examples/input-data/text/data.txt
    • hdfs://hacluster/user/admin/examples/output-data/spark_workflow

  6. In the Options list text box, specify Spark parameters, for example, --conf spark.yarn.archive=hdfs://hacluster/user/spark2x/jars/8.1.0.1/spark-archive-2x.zip --conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://hacluster/spark2xJobHistory2x.

    The version 8.1.0.1 is used as an example. You can log in to FusionInsight Manager, click in the upper right corner, choose About from the drop-down list, and view the FusionInsight Manager version in the dialog box that is displayed.

  7. Click the configuration button in the upper right corner. Set the value of Spark Master, for example, to yarn-cluster. Set the value of Mode, for example, cluster.
  8. On the configuration page that is displayed, click Delete + to delete a directory, for example, hdfs://hacluster/user/admin/examples/output-data/spark_workflow.
  9. Click PROPERTIES+ and add sharelib used by Oozie. Enter the attribute name oozie.action.sharelib.for.spark in the left text box and the attribute value spark2x in the right text box.
  10. Click in the upper right corner of the Oozie editor.

    If you need to modify the job name before saving the job (default value: My Workflow), click the name directly for modification, for example, Spark-Workflow.

  11. After the configuration is saved, click , and submit the job.

    After the job is submitted, you can view the related contents of the job, such as the detailed information, logs, and processes, on Hue.