Updated on 2022-12-14 GMT+08:00

Submitting a DistCp Job

Scenario

This section describes how to submit an Oozie job of the DistCp type on the Hue web UI.

Procedure

  1. Create a workflow. For details, see Creating a Workflow.
  2. On the workflow editing page, select next to Distcp and drag it to the operation area.
  3. Determine whether the current DistCp operation is performed across clusters.

    • If yes, go to 4.
    • If no, go to 7.

  4. Establish cross-Manager mutual trust between two clusters.
  5. In the Distcp window that is displayed, set the value of Source, for example, to hdfs://hacluster/user/admin/examples/input-data/text/data.txt. Set Destination, for example, to hdfs://target_ip:target_port/user/admin/examples/output-data/distcp-workflow/data.txt. Click Add.
  6. Click the configuration button in the upper right corner. On the Properties tab page, click PROPERTIES+, enter the attribute name oozie.launcher.mapreduce.job.hdfs-servers in the text box on the left, enter the attribute value hdfs://source_ip:source_port,hdfs://target_ip:target_port in the text box on the right, and go to 8.

    source_ip: service address of the HDFS NameNode in the source cluster

    source_port: port number of the HDFS NameNode in the source cluster.

    target_ip: service address of the HDFS NameNode in the target cluster

    target_port: port number of the HDFS NameNode in the target cluster.

  7. In the Distcp window that is displayed, set the value of Source, for example, to /user/admin/examples/input-data/text/data.txt. Set Destination, for example, to /user/admin/examples/output-data/distcp-workflow/data.txt. Click Add.
  8. Click in the upper right corner. On the configuration page that is displayed, click Delete+ and add the directory to be deleted, for example, /user/admin/examples/output-data/distcp-workflow.
  9. Click in the upper right corner of the Oozie editor.

    If you need to modify the job name before saving the job (default value: My Workflow), click the name directly for modification, for example, Distcp-Workflow.

  10. After the configuration is saved, click , and submit the job.

    After the job is submitted, you can view the related contents of the job, such as the detailed information, logs, and processes, on Hue.