Updated on 2024-10-29 GMT+08:00

Publishing a Workflow to ModelArts

You can publish a workflow to ModelArts in two ways: Publishing to the Running State and Publishing and Executing the Workflow. Publishing to the running state requires configuring input and output parameters on the workflow page. Publishing and executing the workflow allows you to modify the code and run it directly through the SDK, eliminating the need to configure and run it on the console.

Publishing to the Running State

After creating a workflow, you can use the release() method to publish the workflow to the running state for configuration and execution (on the workflow page of the management console).

Run the following command:

workflow.release()

After the preceding command is executed, if the log indicates that the workflow is published, you can go to the ModelArts workflow page to view the workflow. On the workflow details page, click Configure to configure parameters.

The release_and_run() method is based on the release() method and allows you to publish and run workflows in the development state, without the need to configure and execute workflows on the console.

Note the following when using this method:

  • For all configuration objects related to placeholders in the workflow, you need to either set default values or use fixed data objects directly.
  • The method executes differently depending on the workflow object's name. It creates and runs a new workflow if the name does not exist. It updates and runs the existing workflow if the name already exists, using the new workflow structure for the new execution.
    workflow.release_and_run()

Publishing and Executing the Workflow

With this method, you can publish and run workflows on the SDK without using the console. You need to modify the workflow code as follows:

from modelarts import workflow as wf

# Define a unified output storage path.
output_storage = wf.data.OutputStorage(name="output_storage", description="Unified configuration of output storage", default="**")

# Dataset object
dataset = wf.data.DatasetPlaceholder(name="input_data", default=wf.data.Dataset(dataset_name="**", version_name="**"))

# Create a training job.
job_step = wf.steps.JobStep(
    name="training_job",
    title="Image Classification Training",
    algorithm=wf.AIGalleryAlgorithm(
        subscription_id="**", # Subscription ID of the image classification algorithm. Obtain the subscription ID on the algorithm management page. This parameter is optional.
        item_version_id="10.0.0", # Version number of the subscribed algorithm. This parameter is optional.
        parameters=[
            wf.AlgorithmParameters(name="task_type", value="image_classification_v2"),
            wf.AlgorithmParameters(name="model_name", value="resnet_v1_50"),
            wf.AlgorithmParameters(name="do_train", value="True"),
            wf.AlgorithmParameters(name="do_eval_along_train", value="True"),
            wf.AlgorithmParameters(name="variable_update", value="horovod"),
            wf.AlgorithmParameters(name="learning_rate_strategy", value=wf.Placeholder(name="learning_rate_strategy", placeholder_type=wf.PlaceholderType.STR, default="0.002", description="Learning rate for training. 10:0.001,20:0.0001 indicates that the learning rate of the first 10 epochs is 0.001 and that of the next 10 epochs is 0.0001. If the epoch is not specified, the learning rate will be adjusted based on the validation precision. The training will be stopped if the precision is not significantly improved anymore.")),
            wf.AlgorithmParameters(name="batch_size", value=wf.Placeholder(name="batch_size", placeholder_type=wf.PlaceholderType.INT, default=64, description="Number of images trained in each step (on a single card)")),
            wf.AlgorithmParameters(name="eval_batch_size", value=wf.Placeholder(name="eval_batch_size", placeholder_type=wf.PlaceholderType.INT, default=64, description="Number of images validated in each step (on a single card)")),
            wf.AlgorithmParameters(name="evaluate_every_n_epochs", value=wf.Placeholder(name="evaluate_every_n_epochs", placeholder_type=wf.PlaceholderType.FLOAT, default=1.0, description="Validation is performed every n epochs.")),
            wf.AlgorithmParameters(name="save_model_secs", value=wf.Placeholder(name="save_model_secs", placeholder_type=wf.PlaceholderType.INT, default=60, description="Model saving frequency (unit: s)")),
            wf.AlgorithmParameters(name="save_summary_steps", value=wf.Placeholder(name="save_summary_steps", placeholder_type=wf.PlaceholderType.INT, default=10, description="Summary saving frequency (unit: step)")),
            wf.AlgorithmParameters(name="log_every_n_steps", value=wf.Placeholder(name="log_every_n_steps", placeholder_type=wf.PlaceholderType.INT, default=10, description="Log printing frequency (unit: step)")),
            wf.AlgorithmParameters(name="do_data_cleaning", value=wf.Placeholder(name="do_data_cleaning", placeholder_type=wf.PlaceholderType.STR, default="True", description="Whether to clean data. If the data format is abnormal, the training fails. You are advised to enable this function to ensure training stability. If the data volume is too large, data cleaning may take a long time. You can clean data offline. (Formats including BMP, JPEG, PNG, and RGB three-channel are supported.) You are advised to use JPEG.")),
            wf.AlgorithmParameters(name="use_fp16", value=wf.Placeholder(name="use_fp16", placeholder_type=wf.PlaceholderType.STR, default="True", description="Whether to use mixed precision. Mixed precision accelerates training but causes precision loss. Enable this parameter unless precision is strictly required.")),
            wf.AlgorithmParameters(name="xla_compile", value=wf.Placeholder(name="xla_compile", placeholder_type=wf.PlaceholderType.STR, default="True", description="Whether to use XLA for accelerated training. This function is enabled by default.")),
            wf.AlgorithmParameters(name="data_format", value=wf.Placeholder(name="data_format", placeholder_type=wf.PlaceholderType.ENUM, default="NCHW", enum_list=["NCHW", "NHWC"], description="Input data format. NHWC indicates channel last, and NCHW indicates channel first. This parameter defaults to NCHW (faster).")),
            wf.AlgorithmParameters(name="best_model", value=wf.Placeholder(name="best_model", placeholder_type=wf.PlaceholderType.STR, default="True", description="Whether to save and use the model with the highest precision instead of the latest model during training. The default value is True, indicating that the optimal model is saved. Within a certain error range, the latest high precision model is saved as the optimal model.")),
            wf.AlgorithmParameters(name="jpeg_preprocess", value=wf.Placeholder(name="jpeg_preprocess", placeholder_type=wf.PlaceholderType.STR, default="True", description="Whether to use the JPEG preprocessing acceleration operator (only JPEG data is supported) to accelerate data reading and improve performance. This function is enabled by default. If the data format is not JPEG, enable data cleaning to use the function."))
        ]
    ),
    inputs=[wf.steps.JobInput(name="data_url", data=dataset)],
    outputs=[wf.steps.JobOutput(name="train_url", obs_config=wf.data.OBSOutputConfig(obs_path=output_storage.join("/train_output/")))],
    spec=wf.steps.JobSpec(
        resource=wf.steps.JobResource(
            flavor=wf.Placeholder(
                name="training_flavor",
                placeholder_type=wf.PlaceholderType.JSON,
                description="Training flavor",
                default={"flavor_id": "**"}
            )
        )
    )
)

# Create a workflow.
workflow = wf.Workflow(
    name="image-classification-ResNeSt",
    desc="this is a image classification workflow",
    steps=[job_step],
    storages=[output_storage]
)
  1. Fill in the actual values for all ** in the code above. The configuration mainly involves these three items:
    • Unified storage: default value of output_storage. Enter an existing OBS path in the format of /OBS bucket name/Folder path/.
    • Dataset object: Enter the dataset name and version number.
    • Training flavor: Configure GPU resources since the algorithm in this example can run only on GPUs. You can use free flavor modelarts.p3.large.public.free.
  2. After the configuration, run this code:
    workflow.release_and_run()
  3. After the execution, go to the ModelArts console. In the navigation pane, choose Workflow to view the workflow status.