Data
Data objects are used for phase input and are classified into the following types:
- Actual data objects, which are specified when you create a workflow
- Dataset: defines existing datasets. This object is used for data labeling and model training.
- LabelTask: defines existing labeling jobs. This object is used for data labeling and dataset version release.
- OBSPath: defines an OBS path. This object is used for model training, dataset import, and model import.
- ServiceData: defines an existing service. This object is used only for service update.
- SWRImage: defines an existing SWR path. This object is used for model registration.
- GalleryModel: defines a model subscribed from AI Gallery. This object is used for model registration.
- Placeholder data objects, which are specified when a workflow is running
- DatasetPlaceholder: defines datasets to be specified when a workflow is running. This object is used for data labeling and model training.
- LabelTaskPlaceholder: defines labeling jobs to be specified when a workflow is running. This object is used for data labeling and dataset version release.
- OBSPlaceholder: defines an OBS path to be specified when a workflow is running. This object is used for model training, dataset import, and model import.
- ServiceUpdatePlaceholder: defines existing services to be specified when a workflow is running. This object is used only for service update.
- SWRImagePlaceholder: defines an SWR path to be specified when a workflow is running. This object is used for model registration.
- ServiceInputPlaceholder: defines model information required for service deployment when a workflow is running. This object is used only for service deployment and update.
- DataSelector: supports multiple data types. Currently, this object can be used only on the job phase (only OBS or datasets are supported).
- Data selection object:
DataConsumptionSelector: selects a valid output from the outputs of multiple dependency phases as the data input. This object is usually used for conditional branching. (When creating a workflow, the output of which dependency phase will be used as the data input source is not specified. The data input source should be automatically selected based on the actual execution status of the dependency phases.)
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
dataset_name |
Dataset name |
Yes |
str |
version_name |
Dataset version |
No |
str |
Example:
example = Dataset(dataset_name = "**", version_name = "**") # Obtain the dataset name and version name in the ModelArts dataset module.
When a dataset is used as the input of a phase, configure version_name based on service requirements. For example, version_name is not required for LabelingStep and ReleaseDatasetStep, but mandatory for JobStep.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
dataset_name |
Dataset name |
Yes |
str |
task_name |
Labeling job name |
Yes |
str |
Example:
example = LabelTask(dataset_name = "**", task_name = "**") # Obtain the dataset name and labeling job name in the ModelArts dataset module.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
obs_path |
OBS path |
Yes |
str, Storage |
Example:
example = OBSPath(obs_path = "**") # Obtain the OBS path from Object Storage Service.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
service_id |
Service ID |
Yes |
str |
Example:
example = ServiceData(service_id = "**") # Obtain the service ID in ModelArts Real-Time Services. This object describes a specified real-time service and is used for service update.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
swr_path |
SWR path to a container image |
Yes |
str |
Example:
example = SWRImage(swr_path = "**") # Container image path, which is used as the input for model registration
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
subscription_id |
Subscription ID of a subscribed model |
Yes |
str |
version_num |
Version number of a subscribed model |
Yes |
str |
Example:
example = GalleryModel(subscription_id="**", version_num="**") # Subscribed model object, which is used as the input of the model registration phase
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
data_type |
Data type |
No |
DataTypeEnum |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
default |
Default value of a data object |
No |
Dataset |
Example:
example = DatasetPlaceholder(name = "**", data_type = DataTypeEnum.IMAGE_CLASSIFICATION) # Dataset object placeholder. Configure data_type to specify supported data types.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
object_type |
OBS object type. Only "file" and "directory" are supported. |
Yes |
str |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
default |
Default value of a data object |
No |
OBSPath |
Example:
example = OBSPlaceholder(name = "**", object_type = "directory" ) # OBS object placeholder. You can set object_type to file or directory.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
task_type |
Type of a labeling job |
No |
LabelTaskTypeEnum |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
Example:
example = LabelTaskPlaceholder(name = "**") # LabelTask object placeholder
Field |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
Example:
example = ServiceUpdatePlaceholder(name = "**") # ServiceData object placeholder, which is used as the input for service update
Field |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
Example:
example = SWRImagePlaceholder(name = "**" ) # SWRImage object placeholder, which is used as the input for model registration.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
model_name |
Model name |
Yes |
str or Placeholder |
model_version |
Model version |
No |
str |
envs |
Environment variables |
No |
dict |
delay |
Whether service deployment information is configured when the phase is running. The default value is True. |
No |
bool |
Example:
example = ServiceInputPlaceholder(name = "**" , model_name = "model_name") # This object is used as the input for service deployment or service update.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name |
Yes |
str |
data_type_list |
Supported data types. Currently, only obs and dataset are supported. |
Yes |
list |
delay |
Whether the data object is configured when the phase is running. The default value is False. |
No |
bool |
Example:
example = DataSelector(name = "**" ,data_type_list=["obs", "dataset"]) # This object is used as the input of the job phase.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
data_list |
Output data objects of a dependency phase |
Yes |
list |
Example:
example = DataConsumptionSelector(data_list=[step1.outputs["step1_output_name"].as_input(), step2.outputs["step2_output_name"].as_input()]) # Use the valid output from either step 1 or step 2 as the input. If step 1 is skipped and has no output, use the valid output from step 2 as the input. (Make sure that data_list has only one valid output.)
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot