Updated on 2024-10-08 GMT+08:00

Submitting an Oozie Distcp Job Using Hue

Scenario

This section describes how to submit an Oozie job of the DistCp type on the Hue web UI.

Procedure

  1. Create a workflow. For details, see Creating a Workflow Using Hue.
  2. On the workflow editing page, select next to Distcp and drag it to the operation area.
  3. Determine whether the current DistCp operation is performed across clusters.

    • If yes, go to 4.
    • If no, go to 7.

  4. Establish cross-Manager mutual trust between two clusters.
  5. In the Distcp window that is displayed, set the value of Source, for example, to hdfs://hacluster/user/admin/examples/input-data/text/data.txt. Set Destination, for example, to hdfs://target_ip:target_port/user/admin/examples/output-data/distcp-workflow/data.txt. Click Add.
  6. Click the configuration button in the upper right corner. On the Properties tab page, click PROPERTIES+, enter the attribute name oozie.launcher.mapreduce.job.hdfs-servers in the text box on the left, enter the attribute value hdfs://source_ip:source_port,hdfs://target_ip:target_port in the text box on the right, and go to 8.

    source_ip: service address of the HDFS NameNode in the source cluster

    source_port: port number of the HDFS NameNode in the source cluster.

    target_ip: service address of the HDFS NameNode in the target cluster

    target_port: port number of the HDFS NameNode in the target cluster.

  7. In the Distcp window that is displayed, set the value of Source, for example, to /user/admin/examples/input-data/text/data.txt. Set Destination, for example, to /user/admin/examples/output-data/distcp-workflow/data.txt. Click Add.
  8. Click in the upper right corner. On the configuration page that is displayed, click Delete+ and add the directory to be deleted, for example, /user/admin/examples/output-data/distcp-workflow.

  9. Click in the upper right corner of the Oozie editor.

    If you need to modify the job name before saving the job (default value: My Workflow), click the name directly for modification, for example, Distcp-Workflow.

  10. After the configuration is saved, click , and submit the job.

    After the job is submitted, you can view the related contents of the job, such as the detailed information, logs, and processes, on Hue.