Help Center/ DataArts Studio/ FAQs/ DataArts Factory/ How Do I Reference a Python Script in a Spark Python Job?
Updated on 2025-06-05 GMT+08:00

How Do I Reference a Python Script in a Spark Python Job?

Prerequisites

You have created an MRS Spark connection in Management Center. The connection type is MRS API.

Procedure

  1. In the navigation pane of the DataArts Factory homepage, choose Data Development > Develop Script.
  2. Create a Spark Python script named pyspark_demo. Select the MRS Spark connection that has been created in Management Center.

    from pyspark import SparkConf, SparkContext
    
    conf = SparkConf().setAppName('My App')
    sc = SparkContext(conf=conf)
    
    count = sc.range(1, 1000 * 1000 * 100).filter(lambda x: x > 100).count()
    print('count: ', count)

  3. Save and submit the version.
  4. Click Execute to run the script.

    Figure 1 Executing the script

  5. View the script execution result.

    To reference the created script in a job, perform the following steps:

  6. Create a batch processing pipeline job.
  7. Open the job, drag an MRS Spark Python node to the canvas, and configure parameters for the node.

    MRS Cluster: Select the cluster you selected when creating the connection in Management Center.

    Select Online for Script Type and then select the pyspark_demo script you have created.

    Retain the default values for other parameters.

    Figure 2 Configuring node parameters

  8. Select Run once or Run periodically for Scheduling Type.
  9. Save and submit the version.
  10. Click Execute to run the job.

    Figure 3 Viewing the execution result

  11. On the Job Monitoring page, view the job scheduling status and log.