YAML Configuration File Description
ModelArts stores all parameters required for running the AutoSearch framework in the YAML file, which is mandatory for running the framework. A proper YAML file is the prerequisite for successful search.
YAML File Configuration Example
The following example is used only to help you understand the YAML file. For more examples, see Example: Replacing the Original ResNet-50 with a Better Network Architecture, Example: Searching for Hyperparameters Using Classic Hyperparameter Algorithms, Example: Searching for Network Architectures Using the MBNAS Algorithm, Example: Implementing Auto Data Augmentation Using a Preset Data Augmentation Policy, and Example: Using Multisearch.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
general:
gpu_per_instance: 1
search_space:
- type: discrete
params:
- name: resnet50
values: ["1-11111111-2111121111-211111", "1-1112-1111111111121-11111112111", "1-11111121-12-11111211", "11-111112-112-11111211", "1-1-111111112-11212", "1-1-1-2112112", "1-111211-1111112-21111111", "1-1111111111-21112112-11111","1-111111111112-121111111121-11","11-211-121-11111121", "111-2111-211111-211"]
search_algorithm:
type: grid_search
reward_attr: mean_accuracy
scheduler:
type: FIFOScheduler
|
Overview of YAML Configuration Files
As shown in the preceding example, the YAML configuration consists of four parts:
- Common configuration
- Search space configuration
- Search algorithm configuration
- (Optional) Scheduler configuration
The common configuration is used to configure the resource information required for a single training. The scheduler configuration is optional. The search space and search algorithm configurations are the core of the YAML configuration.
For NAS, hyperparameter, and auto data augmentation scenarios described in Introduction to Auto Search Jobs, the difference lies in the search space and search algorithm. You can define a search space for hyperparameter search and select a suitable algorithm to perform a hyperparameter search. Similarly, you can define a search space for network architecture search and select a suitable algorithm to perform a NAS search. If multiple configuration files are submitted in a task, the system performs multisearch based on the YAML files in sequence.
Common Configuration
Common configuration items are related to resource consumption during the running of the distributed framework. Table 1 lists the common configuration items.
|
Parameter |
Description |
|---|---|
|
cpu_per_instance |
Number of CPUs occupied by a worker |
|
gpu_per_instance |
Number of GPUs occupied by a worker |
|
npu_per_instance |
Number of NPUs occupied by a worker |
|
instance_per_trial |
Number of workers in a trial. The default value is 1. Multiple nodes are not supported. Therefore, this parameter can be set only to 1. |
Example configuration:
If the resources consumed by the original training script are a single node and two CPUs/GPUs, configure the resources as shown in the following command output.
When this configuration is used, if a single node with eight CPUs/GPUs is used to execute an auto search job, and the number of trials executed concurrently exceeds four (8/2 = 4), the excessive trials will wait for execution in the auto search job.
1 2 |
general:
gpu_per_instance: 2
|
Search Space Configuration
A search space usually contains multiple variables, each of which has its own type (discrete, continuous) and value range. YAML is abstracted in this way. The following is a configuration example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
search_space:
- params:
- type: discrete_param
name: filter_size
values: [3, 5, 7]
- type: discrete_param
name: activation
values: [relu, sigmoid]
- type: continuous_param
name: blocks
start: 2
stop: 4
num: 3
- type: continuous_param
name: learning_rate
start: -1
stop: -5
num: 5
base: 10
|
ModelArts provides two basic variables.
- Discrete (discrete_param): The discrete type is very simple. You only need to specify the name and value, for example, filter_size and activation in the preceding configuration.
- Continuous (continuous_param): The value range of a continuous variable must be specified, from start to stop.
In addition, some algorithms support only discrete variables. Therefore, if a continuous variable is entered, it needs to be automatically split into discrete points. Therefore, you need to tell the system how to split a continuous variable. ModelArts provides two methods, corresponding to linspace and logspace of NumPy. If you are familiar with the two APIs of NumPy, you can easily get started.
- If base is not specified, linear splitting is used, which is similar to the functions of np.linspace. For example, blocks in the example indicates that three values are selected from the range from 2 to 4, that is, 2,3,4.
- If base is specified, the values are obtained by logarithm, which is similar to the functions of np.logspace. That is, num values are obtained from base ** start to base ** end. In the example, learning_rate indicates the five values from 10 to the power of negative 5 to 10 to the power of negative 1.
Search Space Configuration (Simplified Format)
To facilitate configuration, ModelArts provides three additional search space types:
- Discrete search space: All parameters in the search space are discrete. After a search space is set as a discrete search space, there is no need to configure the parameter type for all parameters in the search space.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
- type: discrete params: - name : resnet50 values: ["1-11111111-2111121111-211111", "1-1112-1111111111121-11111112111", "1-11111121-12-11111211", "11-111112-112-11111211", "1-1-111111112-11212", "1-1-1-2112112", "1-111211-1111112-21111111", "1-1111111111-21112112-11111", "1-111111111112-121111111121-11", "11-211-121-11111121", "111-2111-211111-211"]
- Continuous search space: All parameters in the search space are continuous. After a search space is set as a continuous search space, there is no need to configure the parameter type for all parameters in the search space.
1 2 3 4 5 6 7 8 9
- type: continuous params: - name: learning_rate start: 0.01 stop: 0.1 - name: weight_decay start: 0.9 stop: 1.0
- Repetitive discrete space: This search space is abstracted to handle the different value ranges of the same parameter at different positions in NAS search.
The type of a repetitive discrete space is repeat_discrete, and the unique attribute repeat indicates the number of repetitions in the search space (for the block-like architecture).
The parameters in a repetitive discrete space are discrete, but the values_each_block attribute is added to indicate the value range of the parameter in each block.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
search_space: - type: repeat_discrete name: mbnas repeat: 8 params: - name: block_repeat values: [0, 1, 2, 3, 4] values_each_block: [ [1, 2, 3, 4], [0, 1, 2, 3, 4], [1, 2, 3, 4], [0, 1, 2, 3, 4], [1, 2, 3, 4], [0, 1, 2, 3, 4], [1, 2, 3, 4], [0, 1, 2, 3, 4], ] - name: neck_filter_ratio values: [1, 1, 2] - name: kernel_size values: [3, 5, 7] - name: filter_ratio values: [0.5, 0.75, 1, 1.25, 1.5]
Search Algorithm Configuration
After a search space is defined, a search algorithm needs to be set to define how to use these variables in the search space.
Use search_algorithm to configure the algorithm. Table 2 describes the two mandatory parameters for algorithm configuration. In addition to the mandatory parameters, you can set different values for different search algorithms. For details, see the algorithm documentation or algorithm code in Table 3.
|
Parameter |
Description |
|---|---|
|
type |
Algorithm type |
|
reward_attr |
All algorithms in AutoSearch aim to maximize the value of reward_attr. This parameter supports mathematical expressions. To minimize a metric, set this parameter to a negative value, for example, -loss. With this function, you can easily achieve multi-object search, for example, a search meeting the requirements of balancing the precision and speed of a model, using mathematical expressions. |
|
Algorithm |
Application Scenario |
Whether reward_attr Supports Expressions |
Configuration Example |
Parameter |
|---|---|---|---|---|
|
random_search |
Hyperparameter and NAS |
Yes |
Searching for Hyperparameters Using Classic Hyperparameter Algorithms |
For details, see Table 4. |
|
grid_search |
Hyperparameter, NAS, and data augmentation |
Yes |
None |
|
|
tpe_search |
Hyperparameter |
No |
Searching for Hyperparameters Using Classic Hyperparameter Algorithms |
For details, see Table 5. |
|
anneal_search |
Hyperparameter |
No |
Searching for Hyperparameters Using Classic Hyperparameter Algorithms |
|
|
bayes_opt_search |
Hyperparameter |
No |
Searching for Hyperparameters Using Classic Hyperparameter Algorithms |
|
|
mbnas |
NAS |
Yes |
For details, see Table 6. |
|
Parameter |
Description |
|---|---|
|
max_concurrent |
Number of trials that are executed at the same time |
|
num_samples |
Total number of trials |
|
mode |
Whether a larger or smaller reward_attr is required. The value can be max or min. |
|
Parameter |
Description |
|---|---|
|
num_of_arcs |
Number of returned architectures |
Scheduler Configuration
A search algorithm provides different trials based on the search space. After many trials are provided, some trials can only be executed in a queue due to limited resources. The scheduler provided by ModelArts schedules these trials. The scheduler has the right to determine which trials are preferentially performed or even terminate a running trial in advance.
Currently, AutoSearch supports FIFOScheduler, MedianStoppingRule, and Huawei-developed ModelBasedScheduler (a prediction model-based scheduler, which can be used with random search, grid search, mbnas, and autoevo).
No additional parameters need to be configured for the FIFOScheduler and MedianStoppingRule schedulers but additional parameters need to be configured for the ModelBasedScheduler scheduler.
- FIFOScheduler configuration
1 2
scheduler: type: FIFOScheduler
- MedianStoppingRule configuration
1 2
scheduler: type: MedianStoppingRule
MedianStoppingRule terminates some trials that are worse than other trials in advance. The judgment principle is as follows: After receiving the report of several steps, if the best_result of the current trial is worse than the median of the running average of the same number of steps of earlier trails, the scheduler will terminate or suspend the trial in advance.
- ModelBasedScheduler configuration
1 2 3
scheduler: type: ModelBasedScheduler grace_period: 5
ModelBasedScheduler uses the full training data of the previous trial to train a loss curve-based prediction model. After part of epochs are trained in the subsequent trial, the prediction model predicts the final metrics of the trial based on the loss information of the epochs. For the trial whose final metrics are not ideal, the scheduler stops it directly.
Table 7 Parameters supported by ModelBasedScheduler Parameter
Description
grace_period
Number of completed trials before the trained prediction model is used for early stopping. The default value is 20.
input_ratio
Ratio of data used for prediction. The default value is 0.1, indicating that the first 10% of reported data is used to predict the next 90% of the data and the highest precision.
validation_ratio
Ratio of the loss curve used as the validation set. It is used to select the optimal prediction model. The default value is 0. The recommended value is 0.2.
ensemble_models
Number of used prediction models. Integrated learning can be used to train multiple prediction models for joint prediction. The default value is 1.
Last Article: Code Compilation Specifications
Next Article: Example: Replacing the Original ResNet-50 with a Better Network Architecture
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.