Help Center/ ModelArts/ ModelArts User Guide (Standard)/ Using Workflows for Low-Code AI Development/ Workflow Development Command Reference/ Creating Workflow Phases/ Creating a Training Job Phase

Updated on 2024-12-26 GMT+08:00

View PDF

Creating a Training Job Phase

Description

This phase defines the algorithm, input, and output of a job for data processing, model training, or model evaluation. The application scenarios are as follows:

Data preprocessing such as image enhancement and noise reduction
Model training for object detection and image classification

Parameter Overview

You can use JobStep to create a job phase. The following is an example of defining a JobStep.

**Table 1** **JobStep**
Parameter	Description	Mandatory	Data Type
name	Name of a job phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.	Yes	str
algorithm	Algorithm object.	Yes	BaseAlgorithm Algorithm AIGalleryAlgorithm
spec	Job specifications.	Yes	JobSpec
inputs	Inputs of a job phase.	Yes	JobInput or JobInput list
outputs	Outputs of a job phase.	Yes	JobOutput or JobOutput list
title	Title for frontend display.	No	str
description	Description of a job phase.	No	str
policy	Phase execution policy.	No	StepPolicy
depend_steps	Dependent phases.	No	Step or step list

**Table 2** **JobInput**
Parameter	Description	Mandatory	Data Type
name	Input name of the job phase. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter. The input name of a step must be unique.	Yes	str
data	Input data object of a job phase.	Yes	Dataset or OBS object. Currently, only Dataset, DatasetPlaceholder, DatasetConsumption, OBSPath, OBSConsumption, OBSPlaceholder, and DataConsumptionSelector are supported.

**Table 3** **JobOutput**
Parameter	Description	Mandatory	Data Type
name	Output name of the job phase. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter. The output name of a step must be unique.	Yes	str
obs_config	OBS output configuration.	No	OBSOutputConfig
model_config	Model output configuration.	No	ModelConfig
metrics_config	Metrics configuration.	No	MetricsConfig

**Table 4** **OBSOutputConfig**
Parameter	Description	Mandatory	Data Type
obs_path	Existing OBS directory	Yes	str, Placeholder, Storage
metric_file	Name of the file that stores metric information	No	str, Placeholder

**Table 5** **BaseAlgorithm**
Parameter	Description	Mandatory	Data Type
id	Algorithm ID	No	str
subscription_id	Subscription ID of the subscribed algorithm	No	str
item_version_id	Version ID of the subscribed algorithm	No	str
code_dir	Code directory	No	str, Placeholder, Storage
boot_file	Boot file	No	str, Placeholder, Storage
command	Boot command	No	str, Placeholder
parameters	Algorithm hyperparameters	No	AlgorithmParameters list
engine	Information about the image used by the job	No	JobEngine
environments	Environment variables	No	dict

**Table 6** **Algorithm**
Parameter	Description	Mandatory	Data Type
algorithm_id	Algorithm ID	Yes	str
parameters	Algorithm hyperparameters	No	List of algorithm parameters

**Table 7** **AIGalleryAlgorithm**
Parameter	Description	Mandatory	Data Type
subscription_id	Subscription ID of the subscribed algorithm	Yes	str
item_version_id	Version ID of the subscribed algorithm	Yes	str
parameters	Algorithm hyperparameters	No	List of algorithm parameters

**Table 8** **AlgorithmParameters**
Parameter	Description	Mandatory	Data Type
name	Name of an algorithm hyperparameter	Yes	str
value	Value of an algorithm hyperparameter	Yes	int, bool, float, str, Placeholder, Storage

**Table 9** **JobEngine**
Parameter	Description	Mandatory	Data Type
engine_id	Image ID	No	str, Placeholder
engine_name	Image name	No	str, Placeholder
engine_version	Image version	No	str, Placeholder
image_url	Image URL	No	str, Placeholder

**Table 10** **JobSpec**
Parameter	Description	Mandatory	Data Type
resource	Resource information	Yes	JobResource
log_export_path	Log output path	No	LogExportPath
schedule_policy	Job scheduling policy	No	SchedulePolicy
volumes	Information about the file system mounted to the job	No	list[Volume]

**Table 11** **JobResource**
Parameter	Description	Mandatory	Data Type
flavor	Resource flavor.	Yes	Placeholder
node_count	Number of nodes. The default value is 1. If there are multiple nodes, distributed training is supported.	No	int, Placeholder

**Table 12** **SchedulePolicy**
Parameter	Description	Mandatory	Data Type
priority	Job scheduling priority. The value can only be 1, 2, or 3, indicating low, medium, and high priorities, respectively.	Yes	int, Placeholder

**Table 13** **Volume**
Parameter	Description	Mandatory	Data Type
nfs	NFS file system object. In a volume object, only one of nfs, pacific, and pfs can be configured.	No	NFS
pacific	Pacific file system object. In a volume object, only one of nfs, pacific, and pfs can be configured.	No	Placeholder
pfs	OBS parallel file system object. In a volume object, only one of nfs, pacific, and pfs can be configured.	No	PFS, Placeholder

**Table 14** **NFS**
Parameter	Description	Mandatory	Data Type
nfs_server_path	Service address of the NFS file system.	Yes	str, Placeholder
local_path	Path mounted to the container.	Yes	str, Placeholder
read_only	Indicates if the mount mode is set to read-only.	No	bool, Placeholder

**Table 15** **PFS**
Parameter	Description	Mandatory	Data Type
pfs_path	Path of the parallel file system	Yes	str, Placeholder
local_path	Path mounted to the container	Yes	str, Placeholder

Obtaining Resource Flavors

Before creating a job phase, perform the following operations to obtain supported training flavors and engines:

Import packages.

from modelarts.session import Session
from modelarts.estimatorV2 import TrainingJob
from modelarts.workflow.client.job_client import JobClient

Initialize a session.

# If you develop a workflow in a local IDEA, initialize a session as follows:
# Hardcoded or plaintext AK/SK is risky. For security, encrypt your AK/SK and store them in the configuration file or environment variables.
# In this example, the AK/SK are stored in environment variables for identity authentication. Before running this example, set environment variables HUAWEICLOUD_SDK_AK and HUAWEICLOUD_SDK_SK.
__AK = os.environ["HUAWEICLOUD_SDK_AK"]
__SK = os.environ["HUAWEICLOUD_SDK_SK"]
# Decrypt the information if it is encrypted.
session = Session(
    access_key=__AK, # AK information of your account
    secret_key=__SK, # SK information of your account
    region_name="***", # Region to which your account belongs
    project_id="***" # Project ID of your account
)

# If you develop a workflow in a notebook environment, initialize a session:
session = Session()

Obtain public resource pools.

# Obtain the specification list of public resource pools.
spec_list = TrainingJob(session).get_train_instance_types(session) # A list is returned. You can download it.
print(spec_list)

Obtain dedicated resource pools.

# Obtain the list of running dedicated resource pools.
pool_list = JobClient(session).get_pool_list() # A list of dedicated resource pools is returned.
pool_id_list = JobClient(session).get_pool_id_list() # An ID list of dedicated resource pools is returned.
The following lists the flavor IDs of dedicated resource pools. Select one as required.
    modelarts.pool.visual.xlarge (1 card)
    modelarts.pool.visual.2xlarge (2 cards)
    modelarts.pool.visual.4xlarge (4 cards)
    modelarts.pool.visual.8xlarge (8 cards)

Obtain engine types.

# Obtain engine types.
engine_dict = TrainingJob(session).get_engine_list(session) # A dictionary is returned. You can download it.
print(engine_dict)

Examples

There are seven scenarios:

Using an algorithm subscribed to in AI Gallery
Using an algorithm in Algorithm Management
Using a custom algorithm (code directory+boot file+official image)
Using a custom algorithm (code directory+boot command+official image)
Creating a job phase based on the dataset release phase
Job phase with visualization
Using the DataSelector object as the input, which supports OBS or datasets

Using an Algorithm Subscribed from AI Gallery

from modelarts import workflow as wf

# Create an OutputStorage object to centrally manage training output directories.
storage = wf.data.OutputStorage(name="storage_name", title="title_info", description="description_info") # Only name is mandatory.

# Define an input dataset.
dataset = wf.data.DatasetPlaceholder(name="input_dataset")

# Use JobStep to define a training phase. Use a dataset as the input, and use OBS to store the output.
job_step = wf.steps.JobStep(
    name="training_job", # Name of a training phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Image Classification Training", # Title, which defaults to the value of name
    algorithm=wf.AIGalleryAlgorithm(
        subscription_id="subscription_id", # Algorithm subscription ID. You can also enter the version number.
        item_version_id="item_version_id", # Algorithm version ID. You can also enter the version number instead.
        parameters=[
            wf.AlgorithmParameters(
                name="parameter_name", 
                value=wf.Placeholder(name="parameter_name", placeholder_type=wf.PlaceholderType.STR, default="fake_value",description="description_info")
            ) # Algorithm hyperparameters are represented using placeholders, which can be integer, bool, float, or string.
        ]
    ), # Algorithm used for training. An algorithm subscribed to in AI Gallery is used in this example. If the value of an algorithm hyperparameter does not need to be changed, you do not need to configure the hyperparameter in parameters. Hyperparameter values will be automatically filled.
    
    inputs=wf.steps.JobInput(name="data_url", data=dataset), # JobStep inputs are configured when the workflow is running. You can also use wf.data.Dataset(dataset_name="fake_dataset_name", version_name="fake_version_name") for the data field.
    outputs=wf.steps.JobOutput(name="train_url", obs_config=wf.data.OBSOutputConfig(obs_path=storage.join("directory_path"))), # JobStep outputs
    spec=wf.steps.JobSpec(
        resource=wf.steps.JobResource(
            flavor=wf.Placeholder(name="train_flavor", placeholder_type=wf.PlaceholderType.JSON, description="Training flavor")
           
        )
    )# Training flavors
)

workflow = wf.Workflow(
    name="job-step-demo",
    desc="this is a demo workflow",
    steps=[job_step],
    storages=[storage]
)

Using an Algorithm in Algorithm Management

from modelarts import workflow as wf

# Create an OutputStorage object to centrally manage training output directories.
storage = wf.data.OutputStorage(name="storage_name", title="title_info", description="description_info") # Only name is mandatory.

# Define an input dataset.
dataset = wf.data.DatasetPlaceholder(name="input_dataset")

# Use JobStep to define a training phase. Use a dataset as the input, and use OBS to store the output.
job_step = wf.steps.JobStep(
    name="training_job", # Name of a training phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Image Classification Training", # Title, which defaults to the value of name
    algorithm=wf.Algorithm(
        algorithm_id="algorithm_id", # Algorithm ID
        parameters=[
            wf.AlgorithmParameters(
                name="parameter_name", 
                value=wf.Placeholder(name="parameter_name", placeholder_type=wf.PlaceholderType.STR, default="fake_value",description="description_info")
            ) # Algorithm hyperparameters are represented using placeholders, which can be integer, bool, float, or string.
        ]
    ), # Algorithm used for training. An algorithm from Algorithm Management is used in this example. If the value of an algorithm hyperparameter does not need to be changed, you do not need to configure the hyperparameter in parameters. Hyperparameter values will be automatically filled.

    inputs=wf.steps.JobInput(name="data_url", data=dataset), # JobStep inputs are configured when the workflow is running. You can also use wf.data.Dataset(dataset_name="fake_dataset_name", version_name="fake_version_name") for the data field.
    outputs=wf.steps.JobOutput(name="train_url", obs_config=wf.data.OBSOutputConfig(obs_path=storage.join("directory_path"))), # JobStep outputs
    spec=wf.steps.JobSpec(
        resource=wf.steps.JobResource(
            flavor=wf.Placeholder(name="train_flavor", placeholder_type=wf.PlaceholderType.JSON, description="Training flavor")

        )
    )# Training flavors
)

workflow = wf.Workflow(
    name="job-step-demo",
    desc="this is a demo workflow",
    steps=[job_step],
    storages=[storage]
)

Using a Custom Algorithm (Code Directory + Boot File + Official Image)

from modelarts import workflow as wf

# Create an OutputStorage object to centrally manage training output directories.
storage = wf.data.OutputStorage(name="storage_name", title="title_info", description="description_info") # Only name is mandatory.

# Define an input dataset.
dataset = wf.data.DatasetPlaceholder(name="input_dataset")

# Use JobStep to define a training phase. Use a dataset as the input, and use OBS to store the output.
job_step = wf.steps.JobStep(
    name="training_job", # Name of a training phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Image Classification Training", # Title, which defaults to the value of name
    algorithm=wf.BaseAlgorithm(
        code_dir="fake_code_dir", # Code directory
        boot_file="fake_boot_file", # Boot file path, which must be in the code directory
        engine=wf.steps.JobEngine(engine_name="fake_engine_name", engine_version="fake_engine_version"), # Name and version of the official image

        parameters=[
            wf.AlgorithmParameters(
                name="parameter_name", 
                value=wf.Placeholder(name="parameter_name", placeholder_type=wf.PlaceholderType.STR, default="fake_value",description="description_info")
            ) # Algorithm hyperparameters are represented using placeholders, which can be integer, bool, float, or string.
        ]
    ), # The custom algorithm is implemented using the code directory, boot file, and official image.

    
    inputs=wf.steps.JobInput(name="data_url", data=dataset), # JobStep inputs are configured when the workflow is running. You can also use wf.data.Dataset(dataset_name="fake_dataset_name", version_name="fake_version_name") for the data field.
    outputs=wf.steps.JobOutput(name="train_url", obs_config=wf.data.OBSOutputConfig(obs_path=storage.join("directory_path"))), # JobStep outputs
    spec=wf.steps.JobSpec(
        resource=wf.steps.JobResource(
            flavor=wf.Placeholder(name="train_flavor", placeholder_type=wf.PlaceholderType.JSON, description="Training flavor")
            
        )
    )# Training flavors
)

workflow = wf.Workflow(
    name="job-step-demo",
    desc="this is a demo workflow",
    steps=[job_step],
    storages=[storage]
)

Using a Custom Algorithm (Code Directory + Boot Command + Custom Image)

from modelarts import workflow as wf

# Create an OutputStorage object to centrally manage training output directories.
storage = wf.data.OutputStorage(name="storage_name", title="title_info", description="description_info") # Only name is mandatory.

# Define an input dataset.
dataset = wf.data.DatasetPlaceholder(name="input_dataset")

# Use JobStep to define a training phase. Use a dataset as the input, and use OBS to store the output.
job_step = wf.steps.JobStep(
    name="training_job", # Name of a training phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Image Classification Training", # Title, which defaults to the value of name
    algorithm=wf.BaseAlgorithm(
        code_dir="fake_code_dir", # Code directory
        command="fake_command", # Boot command
        engine=wf.steps.JobEngine(image_url="fake_image_url"), # Custom image URL, in the format of Organization name/Image name:Version name. Do not contain the domain name; If image_url is required to be configurable in the running state, use the following: image_url=wf.Placeholder(name="image_url", placeholder_type=wf.PlaceholderType.STR, placeholder_format="swr", description="Custom image")
        parameters=[
            wf.AlgorithmParameters(
                name="parameter_name", 
                value=wf.Placeholder(name="parameter_name", placeholder_type=wf.PlaceholderType.STR, default="fake_value",description="description_info")
            ) # Algorithm hyperparameters are represented using placeholders, which can be integer, bool, float, or string.
        ]
    ), The custom algorithm is implemented using the code directory, boot command, and custom image.

    inputs=wf.steps.JobInput(name="data_url", data=dataset), # JobStep inputs are configured when the workflow is running. You can also use wf.data.Dataset(dataset_name="fake_dataset_name", version_name="fake_version_name") for the data field.
    outputs=wf.steps.JobOutput(name="train_url", obs_config=wf.data.OBSOutputConfig(obs_path=storage.join("directory_path"))), # JobStep outputs
    spec=wf.steps.JobSpec(
        resource=wf.steps.JobResource(
            flavor=wf.Placeholder(name="train_flavor", placeholder_type=wf.PlaceholderType.JSON, description="Training flavor")
            
        )
    )# Training flavors
)

workflow = wf.Workflow(
    name="job-step-demo",
    desc="this is a demo workflow",
    steps=[job_step],
    storages=[storage]
)

The preceding four methods use a dataset as the input. If you want to use an OBS path as the input, set data of JobInput to data=wf.data.OBSPlaceholder(name="obs_placeholder_name", object_type="directory") or data=wf.data.OBSPath(obs_path="fake_obs_path").

In addition, you can specify a dataset or OBS path when creating a workflow to reduce configuration operations and facilitate debugging in the development state. You are advised to use placeholders to create a workflow you want to publish to the running state or AI Gallery. In this case, you can configure parameters before workflow execution.

Creating a Job Phase Based on the Dataset Release Phase

Scenario: The output of the dataset release phase is used as the input of the job phase.

from modelarts import workflow as wf

# Define the dataset object.
dataset = wf.data.DatasetPlaceholder(name="input_dataset")

# Define the split ratio between the training set and validation set
train_ration = wf.Placeholder(name="placeholder_name", placeholder_type=wf.PlaceholderType.STR, default="0.8")

release_version_step = wf.steps.ReleaseDatasetStep(
    name="release_dataset", # Name of the dataset release phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Dataset Version Release", # Title, which defaults to the value of name
    inputs=wf.steps.ReleaseDatasetInput(name="input_name", data=dataset), # ReleaseDatasetStep inputs. The dataset object is configured when the workflow is running. You can also use wf.data.Dataset(dataset_name="dataset_name") for the data field.
    outputs=wf.steps.ReleaseDatasetOutput(
        name="output_name", 
        dataset_version_config=wf.data.DatasetVersionConfig(
            label_task_type=wf.data.LabelTaskTypeEnum.IMAGE_CLASSIFICATION,  # Labeling job type for dataset version release
            train_evaluate_sample_ratio=train_ration # Split ratio between the training set and validation set
            )
    ) # ReleaseDatasetStep outputs
)

# Create an OutputStorage object to centrally manage training output directories.
storage = wf.data.OutputStorage(name="storage_name", title="title_info", description="description_info") # Only name is mandatory.

# Use JobStep to define a training phase. Use a dataset as the input, and use OBS to store the output.
job_step = wf.steps.JobStep(
    name="training_job", # Name of a training phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Image Classification Training", # Title, which defaults to the value of name
    algorithm=wf.AIGalleryAlgorithm(
        subscription_id="subscription_id", # Subscription ID of the subscribed algorithm
        item_version_id="item_version_id", # Version ID of the subscribed algorithm
        parameters=[
            wf.AlgorithmParameters(
                name="parameter_name", 
                value=wf.Placeholder(name="parameter_name", placeholder_type=wf.PlaceholderType.STR, default="fake_value",description="description_info")
            ) # Algorithm hyperparameters are represented using placeholders, which can be integer, bool, float, or string.
        ]
    ), # Algorithm used for training. An algorithm subscribed to in AI Gallery is used in this example. If the value of an algorithm hyperparameter does not need to be changed, you do not need to configure the hyperparameter in parameters. Hyperparameter values will be automatically filled.

    
    inputs=wf.steps.JobInput(name="data_url", data=release_version_step.outputs["output_name"].as_input()), # The output of the dataset release phase is used as the input of JobStep.
    outputs=wf.steps.JobOutput(name="train_url", obs_config=wf.data.OBSOutputConfig(obs_path=storage.join("directory_path"))), # JobStep outputs
    spec=wf.steps.JobSpec(
        resource=wf.steps.JobResource(
            flavor=wf.Placeholder(name="train_flavor", placeholder_type=wf.PlaceholderType.JSON, description="Training flavor")
            
        )
    ), # Training flavors
    depend_steps=release_version_step # Preceding dataset release phase
)
# release_version_step is an instance object of wf.steps.ReleaseDatasetStep and output_name is the value of the name field of wf.steps.ReleaseDatasetOutput.

workflow = wf.Workflow(
    name="job-step-demo",
    desc="this is a demo workflow",
    steps=[release_version_step, job_step],
    storages=[storage]
)

Job Phase With Visualization

Phase visualization enables you to view the metrics generated by your workflows in real time. You can also display the external disks of each phase separately. To use phase visualization, you need to add and configure an output for showing metrics through the MetricsConfig object, based on the original job phase.

**Table 16** MetricsConfig
Parameter	Description	Mandatory	Data Type
metric_files	Metric files. Supported element types: str, Placeholder, and Storage.	Yes	list
realtime_visualization	Whether to display the output metrics in real time. The default value is False.	No	bool
visualization	Whether to display visualization phases separately. The default value is True.	No	bool

The output metrics file must contain standard JSON data with a maximum size of 1 MB. The data formats must match the supported ones.

Key-value pair data

[
    {
        "key": "loss",
        "title": "loss",
        "type": "float",
        "data": {
            "value": 1.2
        }
    },
    {
        "key": "accuracy",
        "title": "accuracy",
        "type": "float",
        "data": {
            "value": 1.6
        }
    }
]

Line chart data

[
    {
        "key": "metric",
        "title": "metric",
        "type": "line chart",
        "data": {
            "x_axis": [
                {
                    "title": "step/epoch",
                    "value": [
                        1,
                        2,
                        3
                    ]
                }
            ],
            "y_axis": [
                {
                    "title": "value",
                    "value": [
                        0.5,
                        0.4,
                        0.3
                    ]
                }
            ]
        }
    }
]

Histogram data

[
    {
        "key": "metric",
        "title": "metric",
        "type": "histogram",
        "data": {
            "x_axis": [
                {
                    "title": "step/epoch",
                    "value": [
                        1,
                        2,
                        3
                    ]
                }
            ],
            "y_axis": [
                {
                    "title": "value",
                    "value": [
                        0.5,
                        0.4,
                        0.3
                    ]
                }
            ]
        }
    }
]

Confusion matrix

[
    {
        "key": "confusion_matrix",
        "title": "confusion_matrix",
        "type": "table",
        "data": {
            "cell_value": [
                [
                    1,
                    2
                ],
                [
                    2,
                    3
                ]
            ],
            "col_labels": {
                "title": "labels",
                "value": [
                    "daisy",
                    "dandelion"
                ]
            },
            "row_labels": {
                "title": "predictions",
                "value": [
                    "daisy",
                    "dandelion"
                ]
            }
        }
    }
]

One-dimensional table

[
    {
        "key": "Application Evaluation Results",
        "title": "Application Evaluation Results",
        "type": "one-dimensional-table",
        "data": {
            "cell_value": [
                [
                    10,
                    2,
                    0.5
                ]
            ],
            "labels": [
                "samples",
                "maxResTine",
                "p99"
            ]
        }
    }
]

Example:

from modelarts import workflow as wf

# Create a Storage object to centrally manage training output directories.
storage = wf.data.Storage(name="storage_name", title="title_info", description="description_info", with_execution_id=True, create_dir=True) # Only name is mandatory.

# Define an input dataset.
dataset = wf.data.DatasetPlaceholder(name="input_dataset")

# Use JobStep to define a training phase. Use a dataset as the input, and use OBS to store the output.
job_step = wf.steps.JobStep(
    name="training_job", # Name of a training phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Image Classification Training", # Title, which defaults to the value of name
    algorithm=wf.AIGalleryAlgorithm(
        subscription_id="subscription_id", # Subscription ID of the subscribed algorithm
        item_version_id="item_version_id", # Algorithm version ID. You can also enter the version number instead.
        parameters=[
            wf.AlgorithmParameters(
                name="parameter_name", 
                value=wf.Placeholder(name="parameter_name", placeholder_type=wf.PlaceholderType.STR, default="fake_value",description="description_info")
            ) # Algorithm hyperparameters are represented using placeholders, which can be integer, bool, float, or string.
        ]

    ), # Algorithm used for training. An algorithm subscribed to in AI Gallery is used in this example. If the value of an algorithm hyperparameter does not need to be changed, you do not need to configure the hyperparameter in parameters. Hyperparameter values will be automatically filled.

    
    inputs=wf.steps.JobInput(name="data_url", data=dataset), # JobStep inputs are configured when the workflow is running. You can also use wf.data.Dataset(dataset_name="fake_dataset_name", version_name="fake_version_name") for the data field.
    outputs=[
    wf.steps.JobOutput(name="train_url", obs_config=wf.data.OBSOutputConfig(obs_path=storage.join("directory_path"))),# JobStep outputs
        wf.steps.JobOutput(name="metrics_output", metrics_config=wf.data.MetricsConfig(metric_files=storage.join("directory_path/metrics.json", create_dir=False))) # Metrics are output to the configured path by the job script.
    ], 
    spec=wf.steps.JobSpec(
        resource=wf.steps.JobResource(
            flavor=wf.Placeholder(name="train_flavor", placeholder_type=wf.PlaceholderType.JSON, description="Training flavor")
            
        )
    )# Training flavors
)

workflow = wf.Workflow(
    name="job-step-demo",
    desc="this is a demo workflow",
    steps=[job_step],
    storages=[storage]
)

Workflow does not automatically retrieve the metrics produced by training. You need to extract the metrics from the algorithm code, create the metrics.json file in the required data format, and upload the file to the OBS path specified in MetricsConfig. Workflow only reads, renders, and displays the data.

Using the DataSelector Object as the Input, Which Supports OBS or Datasets

You can use this method when you can choose the input type. The DataSelector object allows you to select either a dataset object or an OBS object as the training input. Here is a code sample:

from modelarts import workflow as wf

# Create an OutputStorage object to centrally manage training output directories.
storage = wf.data.OutputStorage(name="storage_name", title="title_info", description="description_info") # Only name is mandatory.

# Define the DataSelector object.
data_selector = wf.data.DataSelector(name="input_data", data_type_list=["dataset", "obs"])

# Use JobStep to define a training phase. Use a dataset as the input, and use OBS to store the output.
job_step = wf.steps.JobStep(
    name="training_job", # Name of a training phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.
    title="Image Classification Training", # Title, which defaults to the value of name
    algorithm=wf.AIGalleryAlgorithm(
        subscription_id="subscription_id", # Algorithm subscription ID. You can also enter the version number.
        item_version_id="item_version_id", # Algorithm version ID. You can also enter the version number instead.
        parameters=[
            wf.AlgorithmParameters(
                name="parameter_name", 
                value=wf.Placeholder(name="parameter_name", placeholder_type=wf.PlaceholderType.STR, default="fake_value",description="description_info")
            ) # Algorithm hyperparameters are represented using placeholders, which can be integer, bool, float, or string.
        ]
    ), # Algorithm used for training. An algorithm subscribed to in AI Gallery is used in this example. If the value of an algorithm hyperparameter does not need to be changed, you do not need to configure the hyperparameter in parameters. Hyperparameter values will be automatically filled.
    
    inputs=wf.steps.JobInput(name="data_url", data=data_selector), # JobStep inputs are configured when the workflow is running. You can choose OBS or datasets as the input.
    outputs=wf.steps.JobOutput(name="train_url", obs_config=wf.data.OBSOutputConfig(obs_path=storage.join("directory_path"))), # JobStep outputs
    spec=wf.steps.JobSpec(
        resource=wf.steps.JobResource(
            flavor=wf.Placeholder(name="train_flavor", placeholder_type=wf.PlaceholderType.JSON, description="Training flavor")
           
        )
    )# Training flavors
)

workflow = wf.Workflow(
    name="job-step-demo",
    desc="this is a demo workflow",
    steps=[job_step],
    storages=[storage]
)

When using DataSelector as the input, ensure that the algorithm input supports both datasets and OBS.

Parent topic: Creating Workflow Phases

Previous topic: Creating a Dataset Release Phase

Next topic: Creating a Model Registration Phase