Help Center> ModelArts> User Guide (Senior AI Engineers)> Training Management> Auto Search Jobs> Example: Replacing the Original ResNet-50 with a Better Network Architecture

Example: Replacing the Original ResNet-50 with a Better Network Architecture

Use the classification task of ResNet-50 on the MNIST dataset as an example.

Preparing Data

ModelArts provides a sample MNIST dataset named Mnist-Data-Set in the public OBS bucket. This example uses this dataset. Perform the following operations to upload the dataset to your OBS directory, for example, test-modelarts/dataset-mnist.

  1. Download the Mnist-Data-Set dataset to the local PC.
  2. Decompress the Mnist-Data-Set.zip file to the Mnist-Data-Set directory on the local PC.
  3. Upload all files in the Mnist-Data-Set folder to the test-modelarts/dataset-mnist directory on OBS in batches. For details about how to upload files, see Uploading a File.

    The following provides content of the Mnist-Data-Set dataset. .gz is the compressed package.

    In this example, the .gz format is used. Ensure that the four packages of the dataset are uploaded to the OBS directory.

    • t10k-images-idx3-ubyte.gz: validation set, which contains 10,000 samples
    • t10k-labels-idx1-ubyte.gz: labels of the validation set, which contains the labels of the 10,000 samples
    • train-images-idx3-ubyte.gz: training set, which contains 60,000 samples
    • train-labels-idx1-ubyte.gz: labels of the training set, which contains the labels of the 60,000 samples

Sample Code

Assume that you have a TensorFlow code for the image classification task based on the MNIST handwritten digit image dataset using ResNet50. You only need to modify five lines of code to replace the ResNet50 architecture with the auto search job. The following shows the code after modification. Comments are added to indicate the changes.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
import argparse
import time
import os
import logging

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

import autosearch    # Change 1: Import the AutoSearch package.
from autosearch.client.nas.backbone.resnet import ResNet50    # Change 2: Import the preset ResNet50 module to decode the architecture code into the TensorFlow architecture.

parser = argparse.ArgumentParser()
parser.add_argument(
    "--max_steps", type=int, default=100, help="Number of steps to run trainer."
)
parser.add_argument("--data_url", type=str, default="MNIST_data")

parser.add_argument(
    "--learning_rate",
    type=float,
    default=0.01,  # st2
    help="Number of steps to run trainer.",
)
parser.add_argument("--batch_size", type=int, default=1, help="batch size")
FLAGS, unparsed = parser.parse_known_args()

logger = logging.getLogger(__name__)


def train():
    if is_cloud():
        FLAGS.data_url = cloud_init(FLAGS.data_url)
    mnist = input_data.read_data_sets(FLAGS.data_url, one_hot=True)
    with tf.Graph().as_default():
        sess = tf.InteractiveSession()
        with tf.name_scope("input"):
            x = tf.placeholder(tf.float32, [None, 784], name="x-input")
            y_ = tf.placeholder(tf.int64, [None, 10], name="y-input")
        image_shaped_input = tf.reshape(x, [-1, 28, 28, 1])
        y = ResNet50(image_shaped_input, include_top=True, mode="train")  # Change 3: Replace ResNet50 in the original code with the imported ResNet50 decoding module.
        with tf.name_scope("cross_entropy"):
            y = tf.reduce_mean(y, [1, 2])
            y = tf.layers.dense(y, 10)
            with tf.name_scope("total"):
                cross_entropy = tf.losses.softmax_cross_entropy(y_, y)

        with tf.name_scope("train"):
            train_step = tf.train.AdamOptimizer(FLAGS.learning_rate).minimize(  # st2
                cross_entropy
            )

        with tf.name_scope("accuracy"):
            with tf.name_scope("correct_prediction"):
                correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
            with tf.name_scope("accuracy"):
                accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        tf.global_variables_initializer().run()

        def feed_dict(train):
            if train:
                xs, ys = mnist.train.next_batch(100)
            else:
                xs, ys = mnist.test.images, mnist.test.labels
            return {x: xs, y_: ys}

        max_acc = 0
        latencies = []
        for i in range(FLAGS.max_steps):
            if i % 10 == 0:  # Record summaries and test-set accuracy
                loss, acc = sess.run(
                    [cross_entropy, accuracy], feed_dict=feed_dict(False)
                )
                print("loss at step %s: %s" % (i, loss))
                print("acc step %s: %s" % (i, acc))
                if acc > max_acc:
                    max_acc = acc
                autosearch.reporter(loss=loss, mean_accuracy=max_acc)    # Change 4: Report the precision to the AutoSearch framework.
            else:
                start = time.time()
                loss, _ = sess.run(
                    [cross_entropy, train_step], feed_dict=feed_dict(True)
                )
                end = time.time()
                if i % 10 != 1:
                    latencies.append(end - start)
        latency = sum(latencies) / len(latencies)
        autosearch.reporter(mean_accuracy=max_acc, latency=latency)    # Change 4: Report the precision to the AutoSearch framework.
        sess.close()


def is_cloud():
    return True if os.path.exists("/home/work/user-job-dir") else False


def cloud_init(data_url):
    local_data_dir = "/cache/mnist"

    import moxing as mox

    logger.info(
        'Copying from data_url({})" to local path({})'.format(data_url, local_data_dir)
    )
    mox.file.copy_parallel(data_url, local_data_dir)

    return local_data_dir

Compiling the Configuration File

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
general:
  gpu_per_instance: 1

search_space:
  - type: discrete
    params:
      - name: resnet50
        values: ["1-11111111-2111121111-211111", "1-1112-1111111111121-11111112111", "1-11111121-12-11111211", "11-111112-112-11111211", "1-1-111111112-11212", "1-1-1-2112112", "1-111211-1111112-21111111", "1-1111111111-21112112-11111","1-111111111112-121111111121-11","11-211-121-11111121", "111-2111-211111-211"]

search_algorithm:
  type: grid_search
  reward_attr: mean_accuracy

scheduler:
  type: FIFOScheduler

Starting a Search Job

Create an auto search job following the instructions in Creating an Auto Search Job. Set the boot file to the sample code file in Sample Code and set config_path to the OBS path of the sample YAML file, for example, obs://bucket_name/config.yaml. After the configuration is complete, submit the job to start the search job.

  • The sample code needs to be stored as a .py file, which is the boot script of the search job.
  • The YAML configuration file must end with .yaml.
  • The boot script and YAML configuration file can be named based on the actual service.
  • The boot script and YAML configuration file must be uploaded to OBS in advance, and the OBS bucket must be in the same region as ModelArts.
Figure 1 Setting an auto search job

Viewing Search Results

After the auto search job finishes, click the job name to go to the job details page. Then, click the Search Results tab to view the search results.