Updated on 2024-10-29 GMT+08:00

Core Concepts of Workflow Development

Workflow

A workflow is a DAG that consists of phases and the relationships between phases.

A directed line segment shows the dependency between phases. The dependency decides the order of phase execution. In this example, the workflow runs from left to right after it starts. The DAG can handle the multi-branch structure as well. You can design the DAG flexibly according to the real situation. In the multi-branch situation, phases in parallel branches can run at the same time. For details, see Configuring Multi-Branch Phase Data.

Table 1 Workflow

Parameter

Description

Mandatory

Data Type

name

Workflow name. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter.

Yes

str

desc

Workflow description

Yes

str

steps

Phases contained in a workflow

Yes

list[Step]

storages

Unified storage objects

No

Storage or list[Storage]

policy

Workflow configuration policy, which is used for partial execution

No

Policy

Step

A step is the smallest unit of a workflow. In a DAG, a step is also a phase. Different types of steps have different service abilities. The main parts of a step are as follows.

Table 2 Step

Parameter

Description

Mandatory

Data Type

name

Phase name. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter.

Yes

str

title

Title of a phase, which is displayed in the DAG. If this parameter is not configured, the name is displayed by default.

No

str

step_type

Type of a phase, which determines the function of the phase

Yes

enum

inputs

Inputs of a phase

No

AbstractInput or list[AbstractInput]

outputs

Outputs of a phase

No

AbstractOutput or list[AbstractOutput]

properties

Node properties

No

dict

policy

Phase execution policy, which includes the phase scheduling interval, the phase execution timeout interval, and the option to skip phase execution

No

StepPolicy

depend_steps

List of dependency phases. This parameter determines the DAG structure and phase execution sequence.

No

Step or list[Step]

Table 3 StepPolicy

Parameter

Description

Mandatory

Data Type

poll_interval_seconds

Phase scheduling interval. The default value is 1 second.

Yes

str

max_execution_minutes

Phase execution timeout interval. The default value is 10080 minutes, that is, 7 days.

Yes

str

skip_conditions

Conditions that determine whether a phase is skipped

No

Condition or condition list

Step is a superclass of a phase. It has a conceptual role and is not used directly by you. Different types of phase are created based on functions, including CreateDatasetStep, LabelingStep, DatasetImportStep, ReleaseDatasetStep, JobStep, ModelStep, ServiceStep and ConditionStep. For details, see Creating Workflow Phases.

Data

Data objects are used for phase input and are classified into the following types:

  • Actual data objects, which are specified when you create a workflow
    • Dataset: defines existing datasets. This object is used for data labeling and model training.
    • LabelTask: defines existing labeling jobs. This object is usually used for data labeling and dataset version release.
    • OBSPath: defines an OBS path. This object is used for model training, dataset import, and model import.
    • ServiceData: defines an existing service. This object is used only for service update.
    • SWRImage: defines an existing SWR path. This object is used for model registration.
    • GalleryModel: defines a model subscribed from AI Gallery. This object is used for model registration.
  • Placeholder data objects, which are specified when a workflow is running
    • DatasetPlaceholder: defines datasets to be specified when a workflow is running. This object is used for data labeling and model training.
    • LabelTaskPlaceholder: defines labeling jobs to be specified when a workflow is running. This object is used for data labeling and dataset version release.
    • OBSPlaceholder: defines an OBS path to be specified when a workflow is running. This object is used for model training, dataset import, and model import.
    • ServiceUpdatePlaceholder: defines existing services to be specified when a workflow is running. This object is used only for service update.
    • SWRImagePlaceholder: defines an SWR path to be specified when a workflow is running. This object is used for model registration.
    • ServiceInputPlaceholder: defines model information required for service deployment when a workflow is running. This object is used only for service deployment and update.
    • DataSelector: supports multiple data types. Currently, this object can be used only on the job phase (only OBS or datasets are supported).
  • Data selection object:

    DataConsumptionSelector: selects a valid output from the outputs of multiple dependency phases as the data input. This object is usually used for conditional branching. (When creating a workflow, the output of which dependency phase will be used as the data input source is not specified. The data input source should be automatically selected based on the actual execution status of the dependency phases.)

Table 4 Dataset

Parameter

Description

Mandatory

Data Type

dataset_name

Dataset name

Yes

str

version_name

Dataset version

No

str

Example:

 example = Dataset(dataset_name = "**", version_name = "**")
# Obtain the dataset name and version name from ModelArts datasets.

When a dataset is used as the input of a phase, configure version_name based on service requirements. For example, version_name is not required for LabelingStep and ReleaseDatasetStep, but mandatory for JobStep.

Table 5 LabelTask

Parameter

Description

Mandatory

Data Type

dataset_name

Dataset name

Yes

str

task_name

Labeling job name

Yes

str

Example:

 example = LabelTask(dataset_name = "**", task_name = "**")
# Obtain the dataset name and labeling job name from ModelArts datasets of the new version.
Table 6 OBSPath

Parameter

Description

Mandatory

Data Type

obs_path

OBS path

Yes

str, Storage

Example:

example = OBSPath(obs_path = "**")
# Obtain the OBS path from Object Storage Service.
Table 7 ServiceData

Parameter

Description

Mandatory

Data Type

service_id

Service ID

Yes

str

Example:

example = ServiceData(service_id = "**")
# Obtain the service ID in ModelArts Real-Time Services. This object describes a specified real-time service and is used for service update.
Table 8 SWRImage

Parameter

Description

Mandatory

Data Type

swr_path

SWR path to a container image

Yes

str

Example:

example = SWRImage(swr_path = "**")
# Container image path, which is used as the input for model registration

Table 9 GalleryModel

Parameter

Description

Mandatory

Data Type

subscription_id

Subscription ID of a subscribed model

Yes

str

version_num

Version number of a subscribed model

Yes

str

Example:

example = GalleryModel(subscription_id="**", version_num="**")
# Subscribed model object, which is used as the input of the model registration phase
Table 10 DatasetPlaceholder

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

data_type

Data Type

No

DataTypeEnum

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

default

Default value of a data object

No

Dataset

Example:

example = DatasetPlaceholder(name = "**", data_type = DataTypeEnum.IMAGE_CLASSIFICATION)
# Dataset object placeholder. Configure data_type to specify supported data types.
Table 11 OBSPlaceholder

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

object_type

OBS object type. Only "file" and "directory" are supported.

Yes

str

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

default

Default value of a data object

No

OBSPath

Example:

example = OBSPlaceholder(name = "**", object_type = "directory" )
# OBS object placeholder. You can set object_type to file or directory.
Table 12 LabelTaskPlaceholder

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

task_type

Type of a labeling job

No

LabelTaskTypeEnum

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

Example:

example = LabelTaskPlaceholder(name = "**")
# LabelTask object placeholder
Table 13 ServiceUpdatePlaceholder

Parameter

Description

Mandatory

Data Type

name

Parameter

Yes

str

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

Example:

example = ServiceUpdatePlaceholder(name = "**")
# ServiceData object placeholder, which is used as the input for service update
Table 14 SWRImagePlaceholder

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

Example:

example = SWRImagePlaceholder(name = "**" )
# SWRImage object placeholder, which is used as the input for model registration
Table 15 ServiceInputPlaceholder

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

model_name

Model name

Yes

str or Placeholder

model_version

Model version

No

str

envs

Environment variables

No

dict

delay

Whether service deployment information is configured when the phase is running. The default value is True.

No

bool

Example:

example = ServiceInputPlaceholder(name = "**" , model_name = "model_name")
# This object is used as the input for service deployment or service update.
Table 16 DataSelector

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

data_type_list

Supported data types. Currently, only obs and dataset are supported.

Yes

list

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

Example:

example = DataSelector(name = "**" ,data_type_list=["obs", "dataset"])
# This object is used as the input of the job phase.
Table 17 DataConsumptionSelector

Parameter

Description

Mandatory

Data Type

data_list

Output data objects of a dependency phase

Yes

list

Example:

example = DataConsumptionSelector(data_list=[step1.outputs["step1_output_name"].as_input(), step2.outputs["step2_output_name"].as_input()])
# Use the valid output from either step 1 or step 2 as the input. If step 1 is skipped and has no output, use the valid output from step 2 as the input. (Make sure that data_list has only one valid output.)