Updated on 2024-11-29 GMT+08:00

Using Oozie from Scratch

Oozie is an open-source workflow engine that is used to schedule and coordinate Hadoop jobs.

Oozie can be used to submit a wide array of jobs, such as Hive, Spark, Loader, MapReduce, Java, DistCp, Shell, HDFS, SSH, SubWorkflow, Streaming, and scheduled jobs.

This section describes how to use the Oozie client to submit a MapReduce job.

Prerequisites

The client has been installed in a directory, for example, /opt/client. The client directory in the following operations is only an example. Change it based on site requirements.

Procedure

  1. Log in to the node where the client is installed as the client installation user.
  2. Run the following command to go to the client installation directory, for example, /opt/client:

    cd /opt/client

  3. Run the following command to configure environment variables:

    source bigdata_env

  4. Check the cluster authentication mode.

    • If the cluster is in security mode, run the following command to authenticate the user: UserOozie indicates the user who submits tasks.

      kinit UserOozie

    • If the cluster is in normal mode, go to 5.

  5. Upload the Oozie configuration file and JAR package to HDFS.

    hdfs dfs -mkdir /user/UserOozie

    hdfs dfs -put -f /opt/client/Oozie/oozie-client-*/examples /user/UserOozie/

    • /opt/client is the client installation directory. Change it based on site requirements.
    • UserOozie indicates the name of the user who submits jobs.
    • After creating the /user/UserOozie directory and uploading files in /opt/client/Oozie/oozie-client-*/examples to the directory, ensure that the directory, all files in the directory, and subdirectories have permission 755. Otherwise, exceptions may occur when the Oozie client is used to submit tasks.

  6. Run the following commands to modify the job execution configuration file:

    cd /opt/client/Oozie/oozie-client-*/examples/apps/map-reduce/

    vi job.properties

    nameNode=hdfs://hacluster
    resourceManager=10.64.35.161:8032 (10.64.35.161 is the service plane IP address of the Yarn resourceManager (active) node, and 8032 is the port number of yarn.resourcemanager.port)
    queueName=default
    examplesRoot=examples
    user.name=admin
    oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce# HDFS upload path
    outputDir=map-reduce
    oozie.wf.rerun.failnodes=true

  7. Run the following command to execute the Oozie job:

    oozie job -oozie https://Host name of the Oozie role:21003/oozie/ -config job.properties -run

    [root@kwephispra44947 map-reduce]# oozie job -oozie https://kwephispra44948:21003/oozie/ -config job.properties -run
    ......
    job: 0000000-200730163829770-oozie-omm-W

  8. Log in to FusionInsight Manager.
  9. Choose Cluster > Services > Oozie, click the hyperlink next to Oozie WebUI to access the Oozie page, and view the task execution result on the Oozie web UI.

    Figure 1 Task execution result