Updated on 2024-10-29 GMT+08:00

Overview

ModelArts automatically searches for optimal hyperparameters for your models, saving time and effort.

During training, hyperparameters like learning_rate and weight_decay need to be adjusted. ModelArts hyperparameter search optimizes these settings automatically, outperforming manual tuning in speed and precision.

ModelArts supports the following hyperparameter search algorithms:

  • Bayesian Optimization (SMAC)
  • Tree-structured Parzen Estimator (TPE)
  • Simulated Annealing

Bayesian Optimization (SMAC)

Bayesian optimization assumes a functional relationship between hyperparameters and the objective function. It estimates the mean and variance of objective function values at other search points using Gaussian process regression, based on the evaluation values of the searched hyperparameters. The mean and variance are then used to construct the acquisition function, which identifies the next search point as its maximum value. Bayesian optimization reduces the number of iterations and search time by leveraging previous evaluation results, but it can struggle to find the global optimal solution.

Table 1 Bayesian optimization parameters

Parameter

Description

Recommended Value

num_samples

Number of hyperparameter groups to search

This integer ranges from 10 to 20. Larger values increase search time but improve results.

kind

Acquisition function type

This string defaults to ucb. Other options are ei and poi, but it is best to stick with the default.

kappa

Adjustment parameter for the ucb acquisition function, representing the upper confidence boundary

This float value should remain unchanged.

xi

Adjustment parameter for poi and ei acquisition functions.

This float value should remain unchanged.

Tree-structured Parzen Estimator (TPE)

The TPE algorithm uses a Gaussian mixture model to learn model hyperparameters. On each trial, TPE fits two Gaussian mixture models: one to the best objective values and another to the remaining values. It selects the hyperparameter value that maximizes the ratio of these two models.

Table 2 TPE parameters

Parameter

Description

Recommended Value

num_samples

Number of hyperparameter groups to search

This integer ranges from 10 to 20. Larger values increase search time but improve results.

n_initial_points

Number of random evaluations of the objective function before using tree-structured parzen estimators

This integer should remain unchanged.

gamma

Quantile used by the TPE algorithm to split l(x) and g(x)

This float value ranges from 0 to 1 and should remain unchanged.

Simulated Annealing

The simulated annealing algorithm is a simple and effective search method that uses the smoothness of the response surface. It starts with a previous trial point and samples each hyperparameter from a distribution similar to the prior, but with a higher concentration around the chosen point. Over time, the algorithm focuses on sampling points closer to the best ones. Occasionally, it may select a runner-up trial as the best to avoid local optima with a certain probability.

Table 3 Simulated annealing parameters

Parameter

Description

Recommended Value

num_samples

Number of hyperparameter groups to search

This integer ranges from 10 to 20. Larger values increase search time but improve results.

avg_best_idx

Mean of the geometric distribution used to select trials for exploration, based on their scores

This float value should remain unchanged.

shrink_coef

Rate at which the sampling neighborhood size decreases as more points are explored

This float value should remain unchanged.