Updated on 2024-08-14 GMT+08:00

Examples

There are three scenarios:

  • Importing data in a specified path to a target dataset
    • Importing labeled data to a dataset
    • Importing unlabeled data to a dataset
  • Importing data in a specified path to a target labeling job
    • Importing labeled data to a labeling job
    • Importing unlabeled data to a labeling job
  • Creating a dataset import phase based on the dataset creation phase

Importing Data in a Specified Path to a Target Dataset

Scenario: Data needs to be updated for a dataset.

  • You import labeled data (with label information) in a specified path to a dataset. Then, you can create a dataset release phase to release a version.

    Data preparation: Create a dataset on the ModelArts console and upload labeled data to OBS.

    from modelarts import workflow as wf
    # Use DatasetImportStep to import data in a specified path to a dataset and output the dataset.
    
    # Define the dataset.
    dataset = wf.data.DatasetPlaceholder(name="input_dataset")
    
    # Define the OBS data.
    obs = wf.data.OBSPlaceholder(name = "obs_placeholder_name", object_type = "directory" ) # object_type must be file or directory.
    
    dataset_import = wf.steps.DatasetImportStep(
        name="data_import", # Name of the dataset import phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
        title="Dataset import", # Title, which defaults to the value of name
        inputs=[
            wf.steps.DatasetImportInput(name="input_name_1", data=dataset), # The target dataset is configured when the workflow is running. You can also use wf.data.Dataset(dataset_name="dataset_name") for the data field.
            wf.steps.DatasetImportInput(name="input_name_2", data=obs) # Storage path to the imported dataset, configured when the workflow is running. You can also use wf.data.OBSPath(obs_path="obs_path") for the data field.
        ],# DatasetImportStep inputs
        outputs=wf.steps.DatasetImportOutput(name="output_name"), # DatasetImportStep outputs
        properties=wf.steps.ImportDataInfo(
            annotation_format_config=[
                wf.steps.AnnotationFormatConfig(
                    format_name=wf.steps.AnnotationFormatEnum.MA_IMAGE_CLASSIFICATION_V1, # Labeling format of labeled data, for example, image classification
                    scene=wf.data.LabelTaskTypeEnum.IMAGE_CLASSIFICATION # Labeling scene
                )
            ]
        )
    )
    
    workflow = wf.Workflow(
        name="dataset-import-demo",
        desc="this is a demo workflow",
        steps=[dataset_import]
    )
  • You import unlabeled data in a specified path to a dataset. Then, you can add a labeling phase to label the imported data.

    Data preparation: Create a dataset on the ModelArts console and upload unlabeled data to OBS.

    from modelarts import workflow as wf
    # Use DatasetImportStep to import data in a specified path to a dataset and output the dataset.
    
    # Define the dataset.
    dataset = wf.data.DatasetPlaceholder(name="input_dataset")
    
    # Define the OBS data.
    obs = wf.data.OBSPlaceholder(name = "obs_placeholder_name", object_type = "directory" ) # object_type must be file or directory.
    
    dataset_import = wf.steps.DatasetImportStep(
        name="data_import", # Name of the dataset import phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
        title="Dataset import", # Title, which defaults to the value of name
        inputs=[
            wf.steps.DatasetImportInput(name="input_name_1", data=dataset), # The target dataset is configured when the workflow is running. You can also use wf.data.Dataset(dataset_name="dataset_name") for the data field.
            wf.steps.DatasetImportInput(name="input_name_2", data=obs) # Storage path to the imported dataset, configured when the workflow is running. You can also use wf.data.OBSPath(obs_path="obs_path") for the data field.
        ],# DatasetImportStep inputs
        outputs=wf.steps.DatasetImportOutput(name="output_name"), # DatasetImportStep outputs
    )
    
    workflow = wf.Workflow(
        name="dataset-import-demo",
        desc="this is a demo workflow",
        steps=[dataset_import]
    )

Importing Data in a Specified Path to a Target Labeling Job

Scenario: Data needs to be updated for a labeling job.

  • You import labeled data in a specified path to a labeling job. Then, you can create a dataset release phase to release a version.

    Data preparation: Create a labeling job using a specified dataset and upload the labeled data to OBS.

    from modelarts import workflow as wf
    # Use DatasetImportStep to import data in a specified path to a labeling job and output the labeling job.
    
    # Define the labeling job.
    label_task = wf.data.LabelTaskPlaceholder(name="label_task_placeholder_name")
    
    # Define the OBS data.
    obs = wf.data.OBSPlaceholder(name = "obs_placeholder_name", object_type = "directory" ) # object_type must be file or directory.
    
    dataset_import = wf.steps.DatasetImportStep(
        name="data_import", # Name of the dataset import phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
        title="Dataset import", # Title, which defaults to the value of name
        inputs=[
            wf.steps.DatasetImportInput(name="input_name_1", data=label_task), # Labeling job object, configured when the workflow is running. You can also use wf.data.LabelTask(dataset_name="dataset_name", task_name="label_task_name") for the data field.
            wf.steps.DatasetImportInput(name="input_name_2", data=obs) # Storage path to the imported dataset, configured when the workflow is running. You can also use wf.data.OBSPath(obs_path="obs_path") for the data field.
        ],# DatasetImportStep inputs
        outputs=wf.steps.DatasetImportOutput(name="output_name"), # DatasetImportStep outputs
        properties=wf.steps.ImportDataInfo(
            annotation_format_config=[
                wf.steps.AnnotationFormatConfig(
                    format_name=wf.steps.AnnotationFormatEnum.MA_IMAGE_CLASSIFICATION_V1, # Labeling format of labeled data, for example, image classification
                    scene=wf.data.LabelTaskTypeEnum.IMAGE_CLASSIFICATION # Labeling scene
                )
            ]
        )
    )
    
    workflow = wf.Workflow(
        name="dataset-import-demo",
        desc="this is a demo workflow",
        steps=[dataset_import]
    )
  • You import unlabeled data in a specified path to a labeling job. Then, you can add a labeling phase to label the imported data.

    Data preparation: Create a labeling job using a specified dataset and upload the unlabeled data to OBS.

    from modelarts import workflow as wf
    # Use DatasetImportStep to import data in a specified path to a labeling job and output the labeling job.
    
    # Define the labeling job.
    label_task = wf.data.LabelTaskPlaceholder(name="label_task_placeholder_name")
    
    # Define the OBS data.
    obs = wf.data.OBSPlaceholder(name = "obs_placeholder_name", object_type = "directory" ) # object_type must be file or directory.
    
    dataset_import = wf.steps.DatasetImportStep(
        name="data_import", # Name of the dataset import phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
        title="Dataset import", # Title, which defaults to the value of name
        inputs=[
            wf.steps.DatasetImportInput(name="input_name_1", data=label_task), # Labeling job object, configured when the workflow is running. You can also use wf.data.LabelTask(dataset_name="dataset_name", task_name="label_task_name") for the data field.
            wf.steps.DatasetImportInput(name="input_name_2", data=obs) # Storage path to the imported dataset, configured when the workflow is running. You can also use wf.data.OBSPath(obs_path="obs_path") for the data field.
        ],# DatasetImportStep inputs
        outputs=wf.steps.DatasetImportOutput(name="output_name"), # DatasetImportStep outputs
    )
    
    workflow = wf.Workflow(
        name="dataset-import-demo",
        desc="this is a demo workflow",
        steps=[dataset_import]
    )

Creating a Dataset Import Phase Based on the Dataset Creation Phase

Scenario: The outputs of the dataset creation phase are used as the inputs of the dataset import phase.

from modelarts import workflow as wf
# Use DatasetImportStep to import data in a specified path to a dataset and output the dataset.

# Define the OBS data.
obs = wf.data.OBSPlaceholder(name = "obs_placeholder_name", object_type = "directory" ) # object_type must be file or directory.

dataset_import = wf.steps.DatasetImportStep(
    name="data_import", # Name of the dataset import phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Dataset import", # Title, which defaults to the value of name
    inputs=[
        wf.steps.DatasetImportInput(name="input_name_1", data=create_dataset.outputs["create_dataset_output"].as_input()), # The outputs of the dataset creation phase are used as the inputs of the dataset import phase.
        wf.steps.DatasetImportInput(name="input_name_2", data=obs) # Storage path to the imported dataset, configured when the workflow is running. You can also use wf.data.OBSPath(obs_path="obs_path") for the data field.
    ],# DatasetImportStep inputs
    outputs=wf.steps.DatasetImportOutput(name="output_name"), # DatasetImportStep outputs
    depend_steps=create_dataset # Preceding dataset creation phase
)
# create_dataset is an instance of wf.steps.CreateDatasetStep. create_dataset_output is the name field value of wf.steps.CreateDatasetOutput.

workflow = wf.Workflow(
    name="dataset-import-demo",
    desc="this is a demo workflow",
    steps=[dataset_import]
)