Updated on 2023-06-01 GMT+08:00

Using Oozie from Scratch

Oozie is an open-source workflow engine that is used to schedule and coordinate Hadoop jobs.

Oozie can be used to submit a wide array of jobs, such as Hive, Spark2x, Loader, MapReduce, Java, DistCp, Shell, HDFS, SSH, SubWorkflow, Streaming, and scheduled jobs.

This section describes how to use the Oozie client to submit a MapReduce job.

Prerequisites

The client has been installed in a directory, for example, /opt/client. For details, see Installing a Client. The client directory in the following operations is only an example. Change it based on site requirements.

Procedure

  1. Log in to the node where the client is installed as the client installation user.
  2. Run the following command to go to the client installation directory, for example, /opt/client:

    cd /opt/client

  3. Run the following command to configure environment variables:

    source bigdata_env

  4. Check the cluster authentication mode.

    • If the cluster is in security mode, run the following command to authenticate the user: UserOozie indicates the user who submits tasks.

      kinit UserOozie

    • If the cluster is in normal mode, go to 5.

  5. Upload the Oozie configuration file and JAR package to HDFS.

    hdfs dfs -mkdir /user/UserOozie

    hdfs dfs -put -f /opt/client/Oozie/oozie-client-*/examples /user/UserOozie/

    • /opt/client is the client installation directory. Change it based on site requirements.
    • UserOozie indicates the name of the user who submits jobs.

  6. Run the following commands to modify the job execution configuration file:

    cd /opt/client/Oozie/oozie-client-*/examples/apps/map-reduce/

    vi job.properties

    nameNode=hdfs://hacluster
    resourceManager=10.64.35.161:8032 (10.64.35.161 is the service plane IP address of the Yarn resourceManager (active) node, and 8032 is the port number of yarn.resourcemanager.port)
    queueName=default
    examplesRoot=examples
    user.name=UserOozie (name of the user who submits tasks)
    oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce# HDFS upload path
    outputDir=map-reduce
    oozie.wf.rerun.failnodes=true

  7. Run the following command to execute the Oozie job:

    oozie job -oozie https://Host name of the Oozie role:21003/oozie/ -config job.properties -run

    21003 is the running port of Oozie HTTPS requests. To view the port, log in to FusionInsight Manager, choose Cluster > Services > Oozie and click the Configuration tab. Search for OOZIE_HTTPS_PORT.

    [root@kwephispra44947 map-reduce]# oozie job -oozie https://kwephispra44948:21003/oozie/ -config job.properties -run
    ......
    job: 0000000-200730163829770-oozie-omm-W

  8. Log in to FusionInsight Manager. For details, see Accessing FusionInsight Manager (MRS 3.x or Later).
  9. Choose Cluster > Name of the desired cluster > Services > Oozie, click the hyperlink next to Oozie WebUI to go to the Oozie page, and view the task execution result on the Oozie web UI.

    Figure 1 Task execution result