Help Center/ ModelArts/ FAQs/ Training Jobs/ Reading Data During Training/ How Do I Configure the Input and Output Data for Training Models on ModelArts?
Updated on 2024-06-11 GMT+08:00

How Do I Configure the Input and Output Data for Training Models on ModelArts?

ModelArts allows you to upload a custom algorithm for creating training jobs. Create the algorithm and upload it to an OBS bucket. For details about how to create an algorithm, see Creating an Algorithm. For details about how to create a training job, see Creating a Training Job.

Parsing Input and Output Paths

When a ModelArts model reads data stored in OBS or outputs data to a specified OBS path, perform the following operations to configure the input and output data:

  1. Parse the input and output paths in the training code. The following method is recommended:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    import argparse
    # Create a parsing task.
    parser = argparse.ArgumentParser(description="train mnist",
                                     formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    # Add parameters.
    parser.add_argument('--train_url', type=str, 
                        help='the path model saved')
    parser.add_argument('--data_url', type=str, help='the training data')
    # Parse the parameters.
    args, unknown = parser.parse_known_args()
    

    After the parameters are parsed, use data_url and train_url to replace the paths to the data source and the data output, respectively.

  2. When using a preset image to create a custom algorithm, configure the input and output parameters on the Create Algorithm page based on code settings.
    • Training data is a must for algorithm development. It is a good practice to set the input parameter to data_url, which is the data input source. You can also customize the input parameter based on the algorithm code in the previous step.
      Figure 1 Parsing the input path parameter data_url
    • After model training is complete, the trained model and the output information must be stored in an OBS path. By default, the output data is Output Data and the code path parameter is train_url (customizable).
      Figure 2 Parsing the output path parameter train_url

  3. When creating a training job, configure the input and output paths.

    Select the OBS path or dataset path as the training input, and the OBS path as the output.

    Figure 3 Setting training input and output