Updated on 2022-08-16 GMT+08:00

Rules

Modified configuration files except job.properties must be uploaded to the HDFS again

During workflow running, configuration files except job.properties are read from specified HDFS directories. If the file content in the HDFS is not synchronized, the modification does not take effect.

Each workflow has only one start node and one end node

Based on the syntax requirements, each workflow has only one start node and one end node. To handle the exceptions that may occur, multiple kill nodes can be configured.

In the same workflow, each node can be traversed only once

A workflow does not support the ring structure, so each node can be traversed only once. The execution on the next node starts only when the execution on the current node ends.

A default processing branch must be reserved during process branch selecting

A default processing branch must be reserved during node decision selecting. This prevents a workflow from entering uncontrollable error status and avoids uncontrollable process execution.

Correct:

<decision name="[NODE-NAME]">
     <switch>
         <case to="[NODE_NAME]">[PREDICATE]</case>
         ...
         <case to="[NODE_NAME]">[PREDICATE]</case>
         <default to="[NODE_NAME]"/>
     </switch>
 </decision>

Fork and Join nodes must appear in pairs

Correct:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.2">
     ...
     <fork name="[FORK-NODE-NAME]">
         <path start="[NODE-NAME]" />
         ...
         <path start="[NODE-NAME]" />
     </fork>
     ...
     <join name="[JOIN-NODE-NAME]" to="[NODE-NAME]" />
     ...
 </workflow-app>

Tag sequence cannot be changed

The locations of tags in schemas are strictly restricted. Unnecessary tags can be omitted but the location sequence cannot be changed. The location sequence of tags in different schemas may be different.

Correct:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.2">
     ...
     <action name="[NODE-NAME]">
         <map-reduce>
             <resource-manager>[RESOURCE-MANAGER]</resource-manager>
             <name-node>[NAME-NODE]</name-node>
             <prepare>
                 <delete path="[PATH]"/>
                 ...
                 <mkdir path="[PATH]"/>
                 ...
             </prepare>
             ...
             <job-xml>[JOB-XML-FILE]</job-xml>
             <configuration>
                 <property>
                     <name>[PROPERTY-NAME]</name>
                     <value>[PROPERTY-VALUE]</value>
                 </property>
                 ...
             </configuration> 
            <file>[FILE-PATH]</file>
             ...
             <archive>[FILE-PATH]</archive>
             ...
         </map-reduce>
        <ok to="[NODE-NAME]"/>
         <error to="[NODE-NAME]"/>
     </action>
     ...
 </workflow-app> 

A Hive SQL statement ends with a semicolon (;) and supports single-row comment starting with double hyphens (--)

Correct:

create table A(id int, name string, dt string);
insert into A values(1, "a1", "20150625"); 
select * from A; 
--drop table A;