Help Center/ ModelArts/ FAQs/ Training Jobs/ Reading Data During Training/ How Do I Configure the Input and Output Data for Training Models on ModelArts?
Updated on 2024-06-15 GMT+08:00

How Do I Configure the Input and Output Data for Training Models on ModelArts?

ModelArts allows you to upload a custom algorithm for creating training jobs. Create the algorithm and upload it to an OBS bucket. For details about how to create an algorithm, see Creating an Algorithm. For details about how to create a training job, see Creating a Training Job.

Parsing Input and Output Paths

When a ModelArts model reads data stored in OBS or outputs data to a specified OBS path, perform the following operations to configure the input and output data:

  1. Parse the input and output paths in the training code. The following method is recommended:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    import argparse
    # Create a parsing task.
    parser = argparse.ArgumentParser(description="train mnist",
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    # Add parameters.
    parser.add_argument('--train_url', type=str, 
                        help='the path model saved')
    parser.add_argument('--data_url', type=str, help='the training data')
    # Parse the parameters.
    args, unknown = parser.parse_known_args()
    

    After the parameters are parsed, use data_url and train_url to replace the paths to the data source and the data output, respectively.

  2. When using a preset image to create an algorithm, set the defined input and output parameters based on the code parameters in 1.
    • Training data is a must for algorithm development. You are advised to set the input parameter name to data_url, indicating the input data source. You can also customize code parameters based on the algorithm code in 1.
    • After model training is complete, the trained model and the output information must be stored in an OBS path. By default, Output specifies the model output and the code path parameter is train_url. You can also customize the output path parameters based on the algorithm code in 1.
  3. When creating a training job, configure the input and output paths.

    Select an OBS path or dataset path as the training input, and an OBS path for the output.