Creating a Dataset Labeling Phase
Description
This phase integrates capabilities of the ModelArts dataset module, allowing you to label datasets. The labeling phase is used to create labeling jobs or label existing jobs.
Parameter Overview
You can use LabelingStep to create a labeling phase. The following is an example of defining a LabelingStep.
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Name of a labeling phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow. |
Yes |
str |
inputs |
Inputs of the labeling phase. |
Yes |
LabelingInput or LabelingInput list |
outputs |
Outputs of the labeling phase. |
Yes |
LabelingOutput or LabelingOutput list |
properties |
Configurations for dataset labeling. |
Yes |
LabelTaskProperties |
title |
Title for frontend display. |
No |
str |
description |
Description of the labeling phase. |
No |
str |
policy |
Phase execution policy. |
No |
StepPolicy |
depend_steps |
Dependent phases. |
No |
Step or step list |
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Input name of the labeling phase. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter. The input name of a step must be unique. |
Yes |
str |
data |
Input data object of the labeling phase. |
Yes |
Dataset or labeling job object. Currently, only Dataset, DatasetConsumption, DatasetPlaceholder, LabelTask, LabelTaskPlaceholder, LabelTaskConsumption, and DataConsumptionSelector are supported. |
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Output name of the labeling phase. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter. The output name of a step must be unique. |
Yes |
str |
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
task_type |
Type of a labeling job. Jobs of the specified type are returned. |
Yes |
LabelTaskTypeEnum |
task_name |
Labeling job name. The value contains 1 to 100 characters, including only letters, digits, hyphens (-), and underscores (_). This parameter is mandatory when the input is a dataset object. |
No |
str, Placeholder |
labels |
Labels to be created. |
No |
Label |
properties |
Attributes of a labeling job. You can update this field to record custom information. |
No |
dict |
auto_sync_dataset |
Whether to automatically synchronize the result of a labeling job to the dataset. The options are as follows:
|
No |
bool |
content_labeling |
Whether to enable content labeling for speech paragraph labeling. This function is enabled by default. |
No |
bool |
description |
Labeling job description. The description contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
No |
str |
Parameter |
Description |
Mandatory |
Data Type |
---|---|---|---|
name |
Tag name |
No |
str |
property |
Basic attribute key-value pair of a label, such as color and shortcut keys |
No |
str, dic, Placeholder |
type |
Tag type |
No |
LabelTypeEnum |
Enumeration |
Value |
---|---|
LabelTaskTypeEnum |
IMAGE_CLASSIFICATION OBJECT_DETECTION IMAGE_SEGMENTATION TEXT_CLASSIFICATION NAMED_ENTITY_RECOGNITION TEXT_TRIPLE AUDIO_CLASSIFICATION SPEECH_CONTENT SPEECH_SEGMENTATION DATASET_TABULAR VIDEO_ANNOTATION FREE_FORMAT |
Sample Code of a Dataset Labeling Phase
There are three scenarios:
- Scenario 1: Creating a labeling job for a specified dataset and labeling the dataset
- You have created only one unlabeled dataset and need to label it when the workflow is running.
- After a dataset is imported, the dataset needs to be labeled.
Data preparation: Create a dataset on the ModelArts console.from modelarts import workflow as wf # Use LabelingStep to create a labeling job for the input dataset and label it. # Define an input dataset. dataset = wf.data.DatasetPlaceholder(name="input_dataset") # Define the name parameters of the labeling job. task_name = wf.Placeholder(name="placeholder_name", placeholder_type=wf.PlaceholderType.STR) labeling = wf.steps.LabelingStep( name="labeling", # Name of the labeling phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow. title="Dataset Labeling", # Title, which defaults to the value of name properties=wf.steps.LabelTaskProperties( task_type=wf.data.LabelTaskTypeEnum.IMAGE_CLASSIFICATION, # Labeling job type, for example, image classification task_name=task_name # If the labeling job name does not exist, a job will be created using this name. If the labeling job name exists, the corresponding job will be used. ), inputs=wf.steps.LabelingInput(name="input_name", data=dataset), # LabelingStep inputs. The dataset object is configured when the workflow is running. You can also use wf.data.Dataset(dataset_name="fake_dataset_name") for the data field. outputs=wf.steps.LabelingOutput(name="output_name"), # LabelingStep outputs ) workflow = wf.Workflow( name="labeling-step-demo", desc="this is a demo workflow", steps=[labeling] )
- Scenario 2: Labeling a specified job
- You have created a labeling job and need to label it when the workflow is running.
- After a dataset is imported, the dataset needs to be labeled.
Data preparation: Create a labeling job using a specified dataset on the ModelArts console.from modelarts import workflow as wf # Input a labeling job and label it. # Define a dataset labeling job. label_task = wf.data.LabelTaskPlaceholder(name="label_task_placeholder_name") labeling = wf.steps.LabelingStep( name="labeling", # Name of the labeling phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow. title="Dataset Labeling", # Title, which defaults to the value of name inputs=wf.steps.LabelingInput(name="input_name", data=label_task), # LabelingStep inputs. The labeling job object is configured when the workflow is running. You can also use wf.data.LabelTask(dataset_name="dataset_name", task_name="label_task_name") for the data field. outputs=wf.steps.LabelingOutput(name="output_name"), # LabelingStep outputs ) workflow = wf.Workflow( name="labeling-step-demo", desc="this is a demo workflow", steps=[labeling] )
- Scenario 3: Creating a labeling job based on the output of the dataset creation phase
Scenario: The outputs of the dataset creation phase are used as the inputs of the labeling phase.
from modelarts import workflow as wf # Define parameters of the dataset output path. dataset_output_path = wf.Placeholder(name="dataset_output_path", placeholder_type=wf.PlaceholderType.STR, placeholder_format="obs") # Define the dataset name. dataset_name = wf.Placeholder(name="dataset_name", placeholder_type=wf.PlaceholderType.STR) create_dataset = wf.steps.CreateDatasetStep( name="create_dataset", # Name of a dataset creation phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow. title="Dataset Creation", # Title, which defaults to the value of name inputs=wf.steps.CreateDatasetInput(name="input_name", data=wf.data.OBSPlaceholder(name="obs_placeholder_name", object_type="directory")),# CreateDatasetStep inputs, configured when the workflow is running; the data field can also be represented by the wf.data.OBSPath(obs_path="fake_obs_path") object. outputs=wf.steps.CreateDatasetOutput(name="create_dataset_output", config=wf.data.OBSOutputConfig(obs_path=dataset_output_path)),# CreateDatasetStep outputs properties=wf.steps.DatasetProperties( dataset_name=dataset_name, # If the dataset name does not exist, a dataset will be created using this name. If the dataset name exists, the corresponding dataset will be used. data_type=wf.data.DataTypeEnum.IMAGE, # Data type of the dataset, for example, image ) ) # Define the name parameters of the labeling job. task_name = wf.Placeholder(name="placeholder_name", placeholder_type=wf.PlaceholderType.STR) labeling = wf.steps.LabelingStep( name="labeling", # Name of the labeling phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow. title="Dataset Labeling", # Title, which defaults to the value of name properties=wf.steps.LabelTaskProperties( task_type=wf.data.LabelTaskTypeEnum.IMAGE_CLASSIFICATION, # Labeling job type, for example, image classification task_name=task_name # If the labeling job name does not exist, a job will be created using this name. If the labeling job name exists, the corresponding job will be used. ), inputs=wf.steps.LabelingInput(name="input_name", data=create_dataset.outputs["create_dataset_output"].as_input()), # LabelingStep inputs. The data source is the outputs of the dataset creation phase. outputs=wf.steps.LabelingOutput(name="output_name"), # LabelingStep outputs depend_steps=create_dataset # Preceding dataset creation phase ) # create_dataset is an instance of wf.steps.CreateDatasetStep. create_dataset_output is the name field value of wf.steps.CreateDatasetOutput. workflow = wf.Workflow( name="labeling-step-demo", desc="this is a demo workflow", steps=[create_dataset, labeling] )
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot