Core Concepts of Workflow Development
Workflow
A workflow is a DAG that consists of phases and the relationships between phases.
A directed line segment shows the dependency between phases. The dependency decides the order of phase execution. In this example, the workflow runs from left to right after it starts. The DAG can handle the multi-branch structure as well. You can design the DAG flexibly according to the real situation. In the multi-branch situation, phases in parallel branches can run at the same time. For details, see Configuring Multi-Branch Phase Data.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Workflow name. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter. |
Yes |
str |
desc |
Workflow description |
Yes |
str |
steps |
Phases contained in a workflow |
Yes |
list[Step] |
storages |
Unified storage objects |
No |
Storage or list[Storage] |
policy |
Workflow configuration policy, which is used for partial execution |
No |
Policy |
Step
A step is the smallest unit of a workflow. In a DAG, a step is also a phase. Different types of steps have different service abilities. The main parts of a step are as follows.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Phase name. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter. |
Yes |
str |
title |
Title of a phase, which is displayed in the DAG. If this parameter is not configured, the name is displayed by default. |
No |
str |
step_type |
Type of a phase, which determines the function of the phase |
Yes |
enum |
inputs |
Inputs of a phase |
No |
AbstractInput or list[AbstractInput] |
outputs |
Outputs of a phase |
No |
AbstractOutput or list[AbstractOutput] |
properties |
Node properties |
No |
dict |
policy |
Phase execution policy, which includes the phase scheduling interval, the phase execution timeout interval, and the option to skip phase execution |
No |
StepPolicy |
depend_steps |
List of dependency phases. This parameter determines the DAG structure and phase execution sequence. |
No |
Step or list[Step] |
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
poll_interval_seconds |
Phase scheduling interval. The default value is 1 second. |
Yes |
str |
max_execution_minutes |
Phase execution timeout interval. The default value is 10080 minutes, that is, 7 days. |
Yes |
str |
skip_conditions |
Conditions that determine whether a phase is skipped |
No |
Condition or condition list |
Step is a superclass of a phase. It has a conceptual role and is not used directly by you. Different types of phase are created based on functions, including CreateDatasetStep, LabelingStep, DatasetImportStep, ReleaseDatasetStep, JobStep, ModelStep, ServiceStep and ConditionStep. For details, see Creating Workflow Phases.
Data
Data objects are used for phase input and are classified into the following types:
- Actual data objects, which are specified when you create a workflow
- Dataset: defines existing datasets. This object is used for data labeling and model training.
- LabelTask: defines existing labeling jobs. This object is usually used for data labeling and dataset version release.
- OBSPath: defines an OBS path. This object is used for model training, dataset import, and model import.
- ServiceData: defines an existing service. This object is used only for service update.
- SWRImage: defines an existing SWR path. This object is used for model registration.
- GalleryModel: defines a model subscribed from AI Gallery. This object is used for model registration.
- Placeholder data objects, which are specified when a workflow is running
- DatasetPlaceholder: defines datasets to be specified when a workflow is running. This object is used for data labeling and model training.
- LabelTaskPlaceholder: defines labeling jobs to be specified when a workflow is running. This object is used for data labeling and dataset version release.
- OBSPlaceholder: defines an OBS path to be specified when a workflow is running. This object is used for model training, dataset import, and model import.
- ServiceUpdatePlaceholder: defines existing services to be specified when a workflow is running. This object is used only for service update.
- SWRImagePlaceholder: defines an SWR path to be specified when a workflow is running. This object is used for model registration.
- ServiceInputPlaceholder: defines model information required for service deployment when a workflow is running. This object is used only for service deployment and update.
- DataSelector: supports multiple data types. Currently, this object can be used only on the job phase (only OBS or datasets are supported).
- Data selection object:
DataConsumptionSelector: selects a valid output from the outputs of multiple dependency phases as the data input. This object is usually used for conditional branching. (When creating a workflow, the output of which dependency phase will be used as the data input source is not specified. The data input source should be automatically selected based on the actual execution status of the dependency phases.)
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
dataset_name |
Dataset name |
Yes |
str |
version_name |
Dataset version |
No |
str |
Example:
example = Dataset(dataset_name = "**", version_name = "**") # Obtain the dataset name and version name from ModelArts datasets.
When a dataset is used as the input of a phase, configure version_name based on service requirements. For example, version_name is not required for LabelingStep and ReleaseDatasetStep, but mandatory for JobStep.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
dataset_name |
Dataset name |
Yes |
str |
task_name |
Labeling job name |
Yes |
str |
Example:
example = LabelTask(dataset_name = "**", task_name = "**") # Obtain the dataset name and labeling job name from ModelArts datasets of the new version.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
obs_path |
OBS path |
Yes |
str, Storage |
Example:
example = OBSPath(obs_path = "**") # Obtain the OBS path from Object Storage Service.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
service_id |
Service ID |
Yes |
str |
Example:
example = ServiceData(service_id = "**") # Obtain the service ID in ModelArts Real-Time Services. This object describes a specified real-time service and is used for service update.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
swr_path |
SWR path to a container image |
Yes |
str |
Example:
example = SWRImage(swr_path = "**") # Container image path, which is used as the input for model registration
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
subscription_id |
Subscription ID of a subscribed model |
Yes |
str |
version_num |
Version number of a subscribed model |
Yes |
str |
Example:
example = GalleryModel(subscription_id="**", version_num="**") # Subscribed model object, which is used as the input of the model registration phase
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
data_type |
Data Type |
No |
DataTypeEnum |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
default |
Default value of a data object |
No |
Dataset |
Example:
example = DatasetPlaceholder(name = "**", data_type = DataTypeEnum.IMAGE_CLASSIFICATION) # Dataset object placeholder. Configure data_type to specify supported data types.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
object_type |
OBS object type. Only "file" and "directory" are supported. |
Yes |
str |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
default |
Default value of a data object |
No |
OBSPath |
Example:
example = OBSPlaceholder(name = "**", object_type = "directory" ) # OBS object placeholder. You can set object_type to file or directory.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
task_type |
Type of a labeling job |
No |
LabelTaskTypeEnum |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
Example:
example = LabelTaskPlaceholder(name = "**") # LabelTask object placeholder
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Parameter |
Yes |
str |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
Example:
example = ServiceUpdatePlaceholder(name = "**") # ServiceData object placeholder, which is used as the input for service update
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
Example:
example = SWRImagePlaceholder(name = "**" ) # SWRImage object placeholder, which is used as the input for model registration
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
model_name |
Model name |
Yes |
str or Placeholder |
model_version |
Model version |
No |
str |
envs |
Environment variables |
No |
dict |
delay |
Whether service deployment information is configured when the phase is running. The default value is True. |
No |
bool |
Example:
example = ServiceInputPlaceholder(name = "**" , model_name = "model_name") # This object is used as the input for service deployment or service update.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
data_type_list |
Supported data types. Currently, only obs and dataset are supported. |
Yes |
list |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
Example:
example = DataSelector(name = "**" ,data_type_list=["obs", "dataset"]) # This object is used as the input of the job phase.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
data_list |
Output data objects of a dependency phase |
Yes |
list |
Example:
example = DataConsumptionSelector(data_list=[step1.outputs["step1_output_name"].as_input(), step2.outputs["step2_output_name"].as_input()]) # Use the valid output from either step 1 or step 2 as the input. If step 1 is skipped and has no output, use the valid output from step 2 as the input. (Make sure that data_list has only one valid output.)
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot