Updated on 2024-08-14 GMT+08:00

Data

Data objects are used for phase input and are classified into the following types:

  • Actual data objects, which are specified when you create a workflow
    • Dataset: defines existing datasets. This object is used for data labeling and model training.
    • LabelTask: defines existing labeling jobs. This object is used for data labeling and dataset version release.
    • OBSPath: defines an OBS path. This object is used for model training, dataset import, and model import.
    • ServiceData: defines an existing service. This object is used only for service update.
    • SWRImage: defines an existing SWR path. This object is used for model registration.
    • GalleryModel: defines a model subscribed from AI Gallery. This object is used for model registration.
  • Placeholder data objects, which are specified when a workflow is running
    • DatasetPlaceholder: defines datasets to be specified when a workflow is running. This object is used for data labeling and model training.
    • LabelTaskPlaceholder: defines labeling jobs to be specified when a workflow is running. This object is used for data labeling and dataset version release.
    • OBSPlaceholder: defines an OBS path to be specified when a workflow is running. This object is used for model training, dataset import, and model import.
    • ServiceUpdatePlaceholder: defines existing services to be specified when a workflow is running. This object is used only for service update.
    • SWRImagePlaceholder: defines an SWR path to be specified when a workflow is running. This object is used for model registration.
    • ServiceInputPlaceholder: defines model information required for service deployment when a workflow is running. This object is used only for service deployment and update.
    • DataSelector: supports multiple data types. Currently, this object can be used only on the job phase (only OBS or datasets are supported).
  • Data selection object:

    DataConsumptionSelector: selects a valid output from the outputs of multiple dependency phases as the data input. This object is usually used for conditional branching. (When creating a workflow, the output of which dependency phase will be used as the data input source is not specified. The data input source should be automatically selected based on the actual execution status of the dependency phases.)

Table 1 Dataset

Parameter

Description

Mandatory

Data Type

dataset_name

Dataset name

Yes

str

version_name

Dataset version

No

str

Example:

 example = Dataset(dataset_name = "**", version_name = "**")
# Obtain the dataset name and version name in the ModelArts dataset module.

When a dataset is used as the input of a phase, configure version_name based on service requirements. For example, version_name is not required for LabelingStep and ReleaseDatasetStep, but mandatory for JobStep.

Table 2 LabelTask

Parameter

Description

Mandatory

Data Type

dataset_name

Dataset name

Yes

str

task_name

Labeling job name

Yes

str

Example:

 example = LabelTask(dataset_name = "**", task_name = "**")
# Obtain the dataset name and labeling job name in the ModelArts dataset module.
Table 3 OBSPath

Parameter

Description

Mandatory

Data Type

obs_path

OBS path

Yes

str, Storage

Example:

example = OBSPath(obs_path = "**")
# Obtain the OBS path from Object Storage Service.
Table 4 ServiceData

Parameter

Description

Mandatory

Data Type

service_id

Service ID

Yes

str

Example:

example = ServiceData(service_id = "**")
# Obtain the service ID in ModelArts Real-Time Services. This object describes a specified real-time service and is used for service update.
Table 5 SWRImage

Parameter

Description

Mandatory

Data Type

swr_path

SWR path to a container image

Yes

str

Example:

example = SWRImage(swr_path = "**")
# Container image path, which is used as the input for model registration

Table 6 GalleryModel

Parameter

Description

Mandatory

Data Type

subscription_id

Subscription ID of a subscribed model

Yes

str

version_num

Version number of a subscribed model

Yes

str

Example:

example = GalleryModel(subscription_id="**", version_num="**")
# Subscribed model object, which is used as the input of the model registration phase
Table 7 DatasetPlaceholder

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

data_type

Data type

No

DataTypeEnum

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

default

Default value of a data object

No

Dataset

Example:

example = DatasetPlaceholder(name = "**", data_type = DataTypeEnum.IMAGE_CLASSIFICATION)
# Dataset object placeholder. Configure data_type to specify supported data types.
Table 8 OBSPlaceholder

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

object_type

OBS object type. Only "file" and "directory" are supported.

Yes

str

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

default

Default value of a data object

No

OBSPath

Example:

example = OBSPlaceholder(name = "**", object_type = "directory" )
# OBS object placeholder. You can set object_type to file or directory.
Table 9 LabelTaskPlaceholder

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

task_type

Type of a labeling job

No

LabelTaskTypeEnum

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

Example:

example = LabelTaskPlaceholder(name = "**")
# LabelTask object placeholder
Table 10 ServiceUpdatePlaceholder

Field

Description

Mandatory

Data Type

name

Name

Yes

str

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

Example:

example = ServiceUpdatePlaceholder(name = "**")
# ServiceData object placeholder, which is used as the input for service update
Table 11 SWRImagePlaceholder

Field

Description

Mandatory

Data Type

name

Name

Yes

str

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

Example:

example = SWRImagePlaceholder(name = "**" )
# SWRImage object placeholder, which is used as the input for model registration.
Table 12 ServiceInputPlaceholder

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

model_name

Model name

Yes

str or Placeholder

model_version

Model version

No

str

envs

Environment variables

No

dict

delay

Whether service deployment information is configured when the phase is running. The default value is True.

No

bool

Example:

example = ServiceInputPlaceholder(name = "**" , model_name = "model_name")
# This object is used as the input for service deployment or service update.
Table 13 DataSelector

Parameter

Description

Mandatory

Data Type

name

Name

Yes

str

data_type_list

Supported data types. Currently, only obs and dataset are supported.

Yes

list

delay

Whether the data object is configured when the phase is running. The default value is False.

No

bool

Example:

example = DataSelector(name = "**" ,data_type_list=["obs", "dataset"])
# This object is used as the input of the job phase.
Table 14 DataConsumptionSelector

Parameter

Description

Mandatory

Data Type

data_list

Output data objects of a dependency phase

Yes

list

Example:

example = DataConsumptionSelector(data_list=[step1.outputs["step1_output_name"].as_input(), step2.outputs["step2_output_name"].as_input()])
# Use the valid output from either step 1 or step 2 as the input. If step 1 is skipped and has no output, use the valid output from step 2 as the input. (Make sure that data_list has only one valid output.)