Help Center/ ModelArts/ ModelArts User Guide (Standard)/ Using Workflows for Low-Code AI Development/ Workflow Development Command Reference/ Core Concepts of Workflow Development

Updated on 2024-10-29 GMT+08:00

View PDF

Core Concepts of Workflow Development

Workflow

A workflow is a DAG that consists of phases and the relationships between phases.

A directed line segment shows the dependency between phases. The dependency decides the order of phase execution. In this example, the workflow runs from left to right after it starts. The DAG can handle the multi-branch structure as well. You can design the DAG flexibly according to the real situation. In the multi-branch situation, phases in parallel branches can run at the same time. For details, see Configuring Multi-Branch Phase Data.

**Table 1** Workflow
Parameter	Description	Mandatory	Data Type
name	Workflow name. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter.	Yes	str
desc	Workflow description	Yes	str
steps	Phases contained in a workflow	Yes	list[Step]
storages	Unified storage objects	No	Storage or list[Storage]
policy	Workflow configuration policy, which is used for partial execution	No	Policy

Step

A step is the smallest unit of a workflow. In a DAG, a step is also a phase. Different types of steps have different service abilities. The main parts of a step are as follows.

**Table 2** Step
Parameter	Description	Mandatory	Data Type
name	Phase name. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter.	Yes	str
title	Title of a phase, which is displayed in the DAG. If this parameter is not configured, the name is displayed by default.	No	str
step_type	Type of a phase, which determines the function of the phase	Yes	enum
inputs	Inputs of a phase	No	AbstractInput or list[AbstractInput]
outputs	Outputs of a phase	No	AbstractOutput or list[AbstractOutput]
properties	Node properties	No	dict
policy	Phase execution policy, which includes the phase scheduling interval, the phase execution timeout interval, and the option to skip phase execution	No	StepPolicy
depend_steps	List of dependency phases. This parameter determines the DAG structure and phase execution sequence.	No	Step or list[Step]

**Table 3** StepPolicy
Parameter	Description	Mandatory	Data Type
poll_interval_seconds	Phase scheduling interval. The default value is 1 second.	Yes	str
max_execution_minutes	Phase execution timeout interval. The default value is 10080 minutes, that is, 7 days.	Yes	str
skip_conditions	Conditions that determine whether a phase is skipped	No	Condition or condition list

Step is a superclass of a phase. It has a conceptual role and is not used directly by you. Different types of phase are created based on functions, including CreateDatasetStep, LabelingStep, DatasetImportStep, ReleaseDatasetStep, JobStep, ModelStep, ServiceStep and ConditionStep. For details, see Creating Workflow Phases.

Data

Data objects are used for phase input and are classified into the following types:

Actual data objects, which are specified when you create a workflow
- Dataset: defines existing datasets. This object is used for data labeling and model training.
- LabelTask: defines existing labeling jobs. This object is usually used for data labeling and dataset version release.
- OBSPath: defines an OBS path. This object is used for model training, dataset import, and model import.
- ServiceData: defines an existing service. This object is used only for service update.
- SWRImage: defines an existing SWR path. This object is used for model registration.
- GalleryModel: defines a model subscribed from AI Gallery. This object is used for model registration.

Placeholder data objects, which are specified when a workflow is running
- DatasetPlaceholder: defines datasets to be specified when a workflow is running. This object is used for data labeling and model training.
- LabelTaskPlaceholder: defines labeling jobs to be specified when a workflow is running. This object is used for data labeling and dataset version release.
- OBSPlaceholder: defines an OBS path to be specified when a workflow is running. This object is used for model training, dataset import, and model import.
- ServiceUpdatePlaceholder: defines existing services to be specified when a workflow is running. This object is used only for service update.
- SWRImagePlaceholder: defines an SWR path to be specified when a workflow is running. This object is used for model registration.
- ServiceInputPlaceholder: defines model information required for service deployment when a workflow is running. This object is used only for service deployment and update.
- DataSelector: supports multiple data types. Currently, this object can be used only on the job phase (only OBS or datasets are supported).
Data selection object:
DataConsumptionSelector: selects a valid output from the outputs of multiple dependency phases as the data input. This object is usually used for conditional branching. (When creating a workflow, the output of which dependency phase will be used as the data input source is not specified. The data input source should be automatically selected based on the actual execution status of the dependency phases.)

**Table 4** **Dataset**
Parameter	Description	Mandatory	Data Type
dataset_name	Dataset name	Yes	str
version_name	Dataset version	No	str

Example:

 example = Dataset(dataset_name = "**", version_name = "**")
# Obtain the dataset name and version name from ModelArts datasets.

When a dataset is used as the input of a phase, configure version_name based on service requirements. For example, version_name is not required for LabelingStep and ReleaseDatasetStep, but mandatory for JobStep.

**Table 5** **LabelTask**
Parameter	Description	Mandatory	Data Type
dataset_name	Dataset name	Yes	str
task_name	Labeling job name	Yes	str

Example:

 example = LabelTask(dataset_name = "**", task_name = "**")
# Obtain the dataset name and labeling job name from ModelArts datasets of the new version.

**Table 6** **OBSPath**
Parameter	Description	Mandatory	Data Type
obs_path	OBS path	Yes	str, Storage

Example:

example = OBSPath(obs_path = "**")
# Obtain the OBS path from Object Storage Service.

**Table 7** **ServiceData**
Parameter	Description	Mandatory	Data Type
service_id	Service ID	Yes	str

Example:

example = ServiceData(service_id = "**")
# Obtain the service ID in ModelArts Real-Time Services. This object describes a specified real-time service and is used for service update.

**Table 8** **SWRImage**
Parameter	Description	Mandatory	Data Type
swr_path	SWR path to a container image	Yes	str

Example:

example = SWRImage(swr_path = "**")
# Container image path, which is used as the input for model registration

**Table 9** **GalleryModel**
Parameter	Description	Mandatory	Data Type
subscription_id	Subscription ID of a subscribed model	Yes	str
version_num	Version number of a subscribed model	Yes	str

Example:

example = GalleryModel(subscription_id="**", version_num="**")
# Subscribed model object, which is used as the input of the model registration phase

**Table 10** **DatasetPlaceholder**
Parameter	Description	Mandatory	Data Type
name	Name	Yes	str
data_type	Data Type	No	DataTypeEnum
delay	Whether the data object is configured when the phase is running. The default value is False.	No	bool
default	Default value of a data object	No	Dataset

Example:

example = DatasetPlaceholder(name = "**", data_type = DataTypeEnum.IMAGE_CLASSIFICATION)
# Dataset object placeholder. Configure data_type to specify supported data types.

**Table 11** **OBSPlaceholder**
Parameter	Description	Mandatory	Data Type
name	Name	Yes	str
object_type	OBS object type. Only "file" and "directory" are supported.	Yes	str
delay	Whether the data object is configured when the phase is running. The default value is False.	No	bool
default	Default value of a data object	No	OBSPath

Example:

example = OBSPlaceholder(name = "**", object_type = "directory" )
# OBS object placeholder. You can set object_type to file or directory.

**Table 12** **LabelTaskPlaceholder**
Parameter	Description	Mandatory	Data Type
name	Name	Yes	str
task_type	Type of a labeling job	No	LabelTaskTypeEnum
delay	Whether the data object is configured when the phase is running. The default value is False.	No	bool

Example:

example = LabelTaskPlaceholder(name = "**")
# LabelTask object placeholder

**Table 13** **ServiceUpdatePlaceholder**
Parameter	Description	Mandatory	Data Type
name	Parameter	Yes	str
delay	Whether the data object is configured when the phase is running. The default value is False.	No	bool

Example:

example = ServiceUpdatePlaceholder(name = "**")
# ServiceData object placeholder, which is used as the input for service update

**Table 14** **SWRImagePlaceholde**r
Parameter	Description	Mandatory	Data Type
name	Name	Yes	str
delay	Whether the data object is configured when the phase is running. The default value is False.	No	bool

Example:

example = SWRImagePlaceholder(name = "**" )
# SWRImage object placeholder, which is used as the input for model registration

**Table 15** **ServiceInputPlaceholder**
Parameter	Description	Mandatory	Data Type
name	Name	Yes	str
model_name	Model name	Yes	str or Placeholder
model_version	Model version	No	str
envs	Environment variables	No	dict
delay	Whether service deployment information is configured when the phase is running. The default value is True.	No	bool

Example:

example = ServiceInputPlaceholder(name = "**" , model_name = "model_name")
# This object is used as the input for service deployment or service update.

**Table 16** **DataSelector**
Parameter	Description	Mandatory	Data Type
name	Name	Yes	str
data_type_list	Supported data types. Currently, only obs and dataset are supported.	Yes	list
delay	Whether the data object is configured when the phase is running. The default value is False.	No	bool

Example:

example = DataSelector(name = "**" ,data_type_list=["obs", "dataset"])
# This object is used as the input of the job phase.

**Table 17** **DataConsumptionSelector**
Parameter	Description	Mandatory	Data Type
data_list	Output data objects of a dependency phase	Yes	list

Example:

example = DataConsumptionSelector(data_list=[step1.outputs["step1_output_name"].as_input(), step2.outputs["step2_output_name"].as_input()])
# Use the valid output from either step 1 or step 2 as the input. If step 1 is skipped and has no output, use the valid output from step 2 as the input. (Make sure that data_list has only one valid output.)

Parent topic: Workflow Development Command Reference

Previous topic: Workflow Development Command Reference

Next topic: Configuring Workflow Parameters