Updated on 2024-08-14 GMT+08:00

Examples

There are three scenarios:

  • Creating a labeling job for a specified dataset and labeling the dataset
  • Labeling a specified job
  • Creating a labeling job based on the output of the dataset creation phase

Creating a Labeling Job for a Specified Dataset and Labeling the Dataset

Scenarios:

  • You have created only one unlabeled dataset and need to label it when the workflow is running.
  • After a dataset is imported, the dataset needs to be labeled.
Data preparation: Create a dataset on the ModelArts console.
from modelarts import workflow as wf
# Use LabelingStep to create a labeling job for the input dataset and label it.

# Define the input dataset.
dataset = wf.data.DatasetPlaceholder(name="input_dataset")

# Define the name parameters of the labeling job.
task_name = wf.Placeholder(name="placeholder_name", placeholder_type=wf.PlaceholderType.STR)

labeling = wf.steps.LabelingStep(
    name="labeling", # Name of the labeling phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Dataset labeling", # Title, which defaults to the value of name
    properties=wf.steps.LabelTaskProperties(
        task_type=wf.data.LabelTaskTypeEnum.IMAGE_CLASSIFICATION,   # Labeling job type, for example, image classification
        task_name=task_name   # If the labeling job name does not exist, a job will be created using this name. If the labeling job name exists, the corresponding job will be used.
    ),
    inputs=wf.steps.LabelingInput(name="input_name", data=dataset), # LabelingStep inputs. The dataset object is configured when the workflow is running. You can also use wf.data.Dataset(dataset_name="fake_dataset_name") for the data field.
    outputs=wf.steps.LabelingOutput(name="output_name"), # LabelingStep outputs
)

workflow = wf.Workflow(
    name="labeling-step-demo",
    desc="this is a demo workflow",
    steps=[labeling]
)

Labeling a Specified Job

Scenarios:

  • You have created a labeling job and need to label it when the workflow is running.
  • After a dataset is imported, the dataset needs to be labeled.
Data preparation: Create a labeling job using a specified dataset on the ModelArts console.
from modelarts import workflow as wf
# Input a labeling job and label it.

# Define the labeling job of the dataset.
label_task = wf.data.LabelTaskPlaceholder(name="label_task_placeholder_name")

labeling = wf.steps.LabelingStep(
    name="labeling", # Name of the labeling phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Dataset labeling", # Title, which defaults to the value of name
    inputs=wf.steps.LabelingInput(name="input_name", data=label_task), # LabelingStep inputs. The labeling job object is configured when the workflow is running. You can also use wf.data.LabelTask(dataset_name="dataset_name", task_name="label_task_name") for the data field.
    outputs=wf.steps.LabelingOutput(name="output_name"), # LabelingStep outputs
)

workflow = wf.Workflow(
    name="labeling-step-demo",
    desc="this is a demo workflow",
    steps=[labeling]
)

Creating a Labeling Phase Based on the Dataset Creation Phase

Scenario: The outputs of the dataset creation phase are used as the inputs of the labeling phase.

from modelarts import workflow as wf

# Define parameters of the dataset output path.
dataset_output_path = wf.Placeholder(name="dataset_output_path", placeholder_type=wf.PlaceholderType.STR, placeholder_format="obs")

# Define the dataset name.
dataset_name = wf.Placeholder(name="dataset_name", placeholder_type=wf.PlaceholderType.STR)

create_dataset = wf.steps.CreateDatasetStep(
    name="create_dataset", # Name of a dataset creation phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Dataset creation", # Title, which defaults to the value of name
    inputs=wf.steps.CreateDatasetInput(name="input_name", data=wf.data.OBSPlaceholder(name="obs_placeholder_name", object_type="directory")),# CreateDatasetStep inputs, configured when the workflow is running; the data field can also be represented by the wf.data.OBSPath(obs_path="fake_obs_path") object.
    outputs=wf.steps.CreateDatasetOutput(name="create_dataset_output", config=wf.data.OBSOutputConfig(obs_path=dataset_output_path)),# CreateDatasetStep outputs
    properties=wf.steps.DatasetProperties(
        dataset_name=dataset_name, # If the dataset name does not exist, a dataset will be created using this name. If the dataset name exists, the corresponding dataset will be used.
        data_type=wf.data.DataTypeEnum.IMAGE, # Data type of the dataset, for example, image
    )
)

# Define the name parameters of the labeling job.
task_name = wf.Placeholder(name="placeholder_name", placeholder_type=wf.PlaceholderType.STR)

labeling = wf.steps.LabelingStep(
    name="labeling", # Name of the labeling phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Dataset labeling", # Title, which defaults to the value of name
    properties=wf.steps.LabelTaskProperties(
        task_type=wf.data.LabelTaskTypeEnum.IMAGE_CLASSIFICATION,   # Labeling job type, for example, image classification
        task_name=task_name   # If the labeling job name does not exist, a job will be created using this name. If the labeling job name exists, the corresponding job will be used.
    ),
    inputs=wf.steps.LabelingInput(name="input_name", data=create_dataset.outputs["create_dataset_output"].as_input()), # LabelingStep inputs. The data source is the outputs of the dataset creation phase.
    outputs=wf.steps.LabelingOutput(name="output_name"), # LabelingStep outputs
    depend_steps=create_dataset # Preceding dataset creation phase
)
# create_dataset is an instance of wf.steps.CreateDatasetStep. create_dataset_output is the name field value of wf.steps.CreateDatasetOutput.

workflow = wf.Workflow(
    name="labeling-step-demo",
    desc="this is a demo workflow",
    steps=[create_dataset, labeling]
)