Updated on 2024-05-07 GMT+08:00

Developing a Custom Script

Before you use a preset image to create an algorithm, develop the algorithm code. This section describes how to modify local code for model training on ModelArts.

When creating an algorithm, set the code directory, boot file, input path, and output path. These settings enable the interaction between your codes and ModelArts.

  • Code directory

    Specify the code directory in the OBS bucket and upload training data such as training code, dependency installation packages, or pre-generated model to the directory. After you create the training job, ModelArts downloads the code directory and its subdirectories to the container.

    Take OBS path obs://obs-bucket/training-test/demo-code as an example. The content in the OBS path will be automatically downloaded to ${MA_JOB_DIR}/demo-code in the training container, and demo-code (customizable) is the last-level directory of the OBS path.

    Do not store training data in the code directory. When the training job starts, the data stored in the code directory will be downloaded to the backend. A large amount of training data may lead to a download failure. It is recommended that the size of the code directory does not exceed 50 MB.

  • Boot file

    The boot file in the code directory is used to start the training. Only Python boot files are supported.

  • Input path

    The training data must be uploaded to an OBS bucket or stored in the dataset. In the training code, the input path must be parsed. ModelArts automatically downloads the data in the input path to the local container directory for training. Ensure that you have the read permission to the OBS bucket. After the training job is started, ModelArts mounts a disk to the /cache directory. You can use this directory to store temporary files. For details about the size of the /cache directory, see What Are Sizes of the /cache Directories for Different Resource Specifications in the Training Environment?

  • Output path

    You are advised to set an empty directory as the training output path. In the training code, the output path must be parsed. ModelArts automatically uploads the training output to the output path. Ensure that you have the write and read permissions to the OBS bucket.

The following section describes how to develop training code in ModelArts.

(Optional) Introducing Dependencies

  1. If your model references other dependencies, place the required file or installation package in Code Directory you set during algorithm creation.

Parsing Input and Output Paths

When a ModelArts model reads data stored in OBS or outputs data to a specified OBS path, perform the following operations to configure the input and output data:

  1. Parse the input and output paths in the training code. The following method is recommended:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    import argparse
    # Create a parsing task.
    parser = argparse.ArgumentParser(description='train mnist')
    
    # Add parameters.
    parser.add_argument('--data_url', type=str, default="./Data/mnist.npz", help='path where the dataset is saved')
    parser.add_argument('--train_url', type=str, default="./Model", help='path where the model is saved')
    
    # Parse the parameters.
    args = parser.parse_args()
    

    After the parameters are parsed, use data_url and train_url to replace the paths to the data source and the data output, respectively.

  2. When creating a training job, set the input and output paths.

    Select the OBS path or dataset path as the training input, and the OBS path as the output.

    Figure 1 Setting training input and output

Editing Training Code and Saving the Model

Training code and the code for saving the model are closely related to the AI engine you use. The following uses the TensorFlow framework as an example. Before using this case, you need to download the mnist.npz file and upload it to the OBS bucket. The training input is the OBS path where the mnist.npz file is stored.

import os
import argparse
import tensorflow as tf

parser = argparse.ArgumentParser(description='train mnist')
parser.add_argument('--data_url', type=str, default="./Data/mnist.npz", help='path where the dataset is saved')
parser.add_argument('--train_url', type=str, default="./Model", help='path where the model is saved')
args = parser.parse_args()

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data(args.data_url)
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)

model.save(os.path.join(args.train_url, 'model'))