Help Center> ModelArts> User Guide (Senior AI Engineers)> Training Management> Auto Search Jobs> YAML Configuration File Description

YAML Configuration File Description

ModelArts stores all parameters required for running the AutoSearch framework in the YAML file, which is mandatory for running the framework. A proper YAML file is the prerequisite for successful search.

YAML File Configuration Example

The following example is used only to help you understand the YAML file. For more examples, see Example: Replacing the Original ResNet-50 with a Better Network Architecture, Example: Searching for Hyperparameters Using Classic Hyperparameter Algorithms, Example: Searching for Network Architectures Using the MBNAS Algorithm, Example: Implementing Auto Data Augmentation Using a Preset Data Augmentation Policy, and Example: Using Multisearch.

general:
  gpu_per_instance: 1

search_space:
  - type: discrete
    params:
      - name: resnet50
        values: ["1-11111111-2111121111-211111", "1-1112-1111111111121-11111112111", "1-11111121-12-11111211", "11-111112-112-11111211", "1-1-111111112-11212", "1-1-1-2112112", "1-111211-1111112-21111111", "1-1111111111-21112112-11111","1-111111111112-121111111121-11","11-211-121-11111121", "111-2111-211111-211"]

search_algorithm:
  type: grid_search
  reward_attr: mean_accuracy

scheduler:
  type: FIFOScheduler

Overview of YAML Configuration Files

As shown in the preceding example, the YAML configuration consists of four parts:

Common configuration
Search space configuration
Search algorithm configuration
(Optional) Scheduler configuration

The common configuration is used to configure the resource information required for a single training. The scheduler configuration is optional. The search space and search algorithm configurations are the core of the YAML configuration.

For NAS, hyperparameter, and auto data augmentation scenarios described in Introduction to Auto Search Jobs, the difference lies in the search space and search algorithm. You can define a search space for hyperparameter search and select a suitable algorithm to perform a hyperparameter search. Similarly, you can define a search space for network architecture search and select a suitable algorithm to perform a NAS search. If multiple configuration files are submitted in a task, the system performs multisearch based on the YAML files in sequence.

Common Configuration

Common configuration items are related to resource consumption during the running of the distributed framework. Table 1 lists the common configuration items.

**Table 1** Common configuration items
Parameter	Description
cpu_per_instance	Number of CPUs occupied by a worker
gpu_per_instance	Number of GPUs occupied by a worker
npu_per_instance	Number of NPUs occupied by a worker
instance_per_trial	Number of workers in a trial. The default value is 1. Multiple nodes are not supported. Therefore, this parameter can be set only to 1.

Example configuration:

If the resources consumed by the original training script are a single node and two CPUs/GPUs, configure the resources as shown in the following command output.

When this configuration is used, if a single node with eight CPUs/GPUs is used to execute an auto search job, and the number of trials executed concurrently exceeds four (8/2 = 4), the excessive trials will wait for execution in the auto search job.

1 2	general: gpu_per_instance: 2

Search Space Configuration

A search space usually contains multiple variables, each of which has its own type (discrete, continuous) and value range. YAML is abstracted in this way. The following is a configuration example:

search_space:
  - params:
    - type: discrete_param
      name: filter_size
      values: [3, 5, 7]
    - type: discrete_param
      name: activation
      values: [relu, sigmoid]
    - type: continuous_param
      name: blocks
      start: 2
      stop: 4
      num: 3
    - type: continuous_param
      name: learning_rate
      start: -1
      stop: -5
      num: 5
      base: 10

ModelArts provides two basic variables.

Discrete (discrete_param): The discrete type is very simple. You only need to specify the name and value, for example, filter_size and activation in the preceding configuration.
Continuous (continuous_param): The value range of a continuous variable must be specified, from start to stop.

In addition, some algorithms support only discrete variables. Therefore, if a continuous variable is entered, it needs to be automatically split into discrete points. Therefore, you need to tell the system how to split a continuous variable. ModelArts provides two methods, corresponding to linspace and logspace of NumPy. If you are familiar with the two APIs of NumPy, you can easily get started.

If base is not specified, linear splitting is used, which is similar to the functions of np.linspace. For example, blocks in the example indicates that three values are selected from the range from 2 to 4, that is, 2,3,4.
If base is specified, the values are obtained by logarithm, which is similar to the functions of np.logspace. That is, num values are obtained from base ** start to base ** end. In the example, learning_rate indicates the five values from 10 to the power of negative 5 to 10 to the power of negative 1.

Search Space Configuration (Simplified Format)

To facilitate configuration, ModelArts provides three additional search space types:

Discrete search space: All parameters in the search space are discrete. After a search space is set as a discrete search space, there is no need to configure the parameter type for all parameters in the search space.

- type: discrete
  params:
    - name : resnet50
      values: ["1-11111111-2111121111-211111",
            "1-1112-1111111111121-11111112111",
            "1-11111121-12-11111211",
            "11-111112-112-11111211",
            "1-1-111111112-11212",
            "1-1-1-2112112",
            "1-111211-1111112-21111111",
            "1-1111111111-21112112-11111",
            "1-111111111112-121111111121-11",
            "11-211-121-11111121",
            "111-2111-211111-211"]

Continuous search space: All parameters in the search space are continuous. After a search space is set as a continuous search space, there is no need to configure the parameter type for all parameters in the search space.
1 2 3 4 5 6 7 8 9

- type: continuous params: - name: learning_rate start: 0.01 stop: 0.1 - name: weight_decay start: 0.9 stop: 1.0

Repetitive discrete space: This search space is abstracted to handle the different value ranges of the same parameter at different positions in NAS search.

The type of a repetitive discrete space is repeat_discrete, and the unique attribute repeat indicates the number of repetitions in the search space (for the block-like architecture).

The parameters in a repetitive discrete space are discrete, but the values_each_block attribute is added to indicate the value range of the parameter in each block.

search_space:
  - type: repeat_discrete
    name: mbnas
    repeat: 8
    params:
      - name: block_repeat
        values: [0, 1, 2, 3, 4]
        values_each_block: [
          [1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [1, 2, 3, 4],
          [0, 1, 2, 3, 4],
        ]
      - name: neck_filter_ratio
        values: [1, 1, 2]
      - name: kernel_size
        values: [3, 5, 7]
      - name: filter_ratio
        values: [0.5, 0.75, 1, 1.25, 1.5]

Search Algorithm Configuration

After a search space is defined, a search algorithm needs to be set to define how to use these variables in the search space.

Use search_algorithm to configure the algorithm. Table 2 describes the two mandatory parameters for algorithm configuration. In addition to the mandatory parameters, you can set different values for different search algorithms. For details, see the algorithm documentation or algorithm code in Table 3.

**Table 2** Mandatory parameters
Parameter	Description
type	Algorithm type
reward_attr	All algorithms in AutoSearch aim to maximize the value of reward_attr. This parameter supports mathematical expressions. To minimize a metric, set this parameter to a negative value, for example, -loss. With this function, you can easily achieve multi-object search, for example, a search meeting the requirements of balancing the precision and speed of a model, using mathematical expressions.

**Table 3** Supported algorithms
Algorithm	Application Scenario	Whether reward_attr Supports Expressions	Configuration Example	Parameter
random_search	Hyperparameter and NAS	Yes	Searching for Hyperparameters Using Classic Hyperparameter Algorithms	For details, see Table 4.
grid_search	Hyperparameter, NAS, and data augmentation	Yes	Searching for Hyperparameters Using Classic Hyperparameter Algorithms Example: Implementing Auto Data Augmentation Using a Preset Data Augmentation Policy Example: Using Multisearch	None
tpe_search	Hyperparameter	No	Searching for Hyperparameters Using Classic Hyperparameter Algorithms	For details, see Table 5.
anneal_search	Hyperparameter	No	Searching for Hyperparameters Using Classic Hyperparameter Algorithms
bayes_opt_search	Hyperparameter	No	Searching for Hyperparameters Using Classic Hyperparameter Algorithms
mbnas	NAS	Yes	Using the MBNAS Algorithm	For details, see Table 6.

**Table 4** random_search algorithm parameters
Parameter	Description
repeat	Total number of trials

**Table 5** tpe_search, annal_search, and bayes_opt_search algorithm parameters
Parameter	Description
max_concurrent	Number of trials that are executed at the same time
num_samples	Total number of trials
mode	Whether a larger or smaller reward_attr is required. The value can be max or min.

**Table 6** mbnas algorithm parameters
Parameter	Description
num_of_arcs	Number of returned architectures

Scheduler Configuration

A search algorithm provides different trials based on the search space. After many trials are provided, some trials can only be executed in a queue due to limited resources. The scheduler provided by ModelArts schedules these trials. The scheduler has the right to determine which trials are preferentially performed or even terminate a running trial in advance.

Currently, AutoSearch supports FIFOScheduler, MedianStoppingRule, and Huawei-developed ModelBasedScheduler (a prediction model-based scheduler, which can be used with random search, grid search, mbnas, and autoevo).

No additional parameters need to be configured for the FIFOScheduler and MedianStoppingRule schedulers but additional parameters need to be configured for the ModelBasedScheduler scheduler.

FIFOScheduler configuration
1 2

scheduler: type: FIFOScheduler
MedianStoppingRule configuration
1 2

scheduler: type: MedianStoppingRule
MedianStoppingRule terminates some trials that are worse than other trials in advance. The judgment principle is as follows: After receiving the report of several steps, if the best_result of the current trial is worse than the median of the running average of the same number of steps of earlier trails, the scheduler will terminate or suspend the trial in advance.

ModelBasedScheduler configuration

1
2
3

scheduler:
    type: ModelBasedScheduler
    grace_period: 5

ModelBasedScheduler uses the full training data of the previous trial to train a loss curve-based prediction model. After part of epochs are trained in the subsequent trial, the prediction model predicts the final metrics of the trial based on the loss information of the epochs. For the trial whose final metrics are not ideal, the scheduler stops it directly.

**Table 7** Parameters supported by ModelBasedScheduler
Parameter	Description
grace_period	Number of completed trials before the trained prediction model is used for early stopping. The default value is 20.
input_ratio	Ratio of data used for prediction. The default value is 0.1, indicating that the first 10% of reported data is used to predict the next 90% of the data and the highest precision.
validation_ratio	Ratio of the loss curve used as the validation set. It is used to select the optimal prediction model. The default value is 0. The recommended value is 0.2.
ensemble_models	Number of used prediction models. Integrated learning can be used to train multiple prediction models for joint prediction. The default value is 1.

Parent topic: Auto Search Jobs

Last Article: Code Compilation Specifications

Next Article: Example: Replacing the Original ResNet-50 with a Better Network Architecture

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English