Overview

Updated on 2024-10-29 GMT+08:00

View PDF

ModelArts automatically searches for optimal hyperparameters for your models, saving time and effort.

During training, hyperparameters like learning_rate and weight_decay need to be adjusted. ModelArts hyperparameter search optimizes these settings automatically, outperforming manual tuning in speed and precision.

ModelArts supports the following hyperparameter search algorithms:

Bayesian Optimization (SMAC)
Tree-structured Parzen Estimator (TPE)
Simulated Annealing

Bayesian Optimization (SMAC)

Bayesian optimization assumes a functional relationship between hyperparameters and the objective function. It estimates the mean and variance of objective function values at other search points using Gaussian process regression, based on the evaluation values of the searched hyperparameters. The mean and variance are then used to construct the acquisition function, which identifies the next search point as its maximum value. Bayesian optimization reduces the number of iterations and search time by leveraging previous evaluation results, but it can struggle to find the global optimal solution.

**Table 1** Bayesian optimization parameters
Parameter	Description	Recommended Value
num_samples	Number of hyperparameter groups to search	This integer ranges from 10 to 20. Larger values increase search time but improve results.
kind	Acquisition function type	This string defaults to ucb. Other options are ei and poi, but it is best to stick with the default.
kappa	Adjustment parameter for the ucb acquisition function, representing the upper confidence boundary	This float value should remain unchanged.
xi	Adjustment parameter for poi and ei acquisition functions.	This float value should remain unchanged.

Tree-structured Parzen Estimator (TPE)

The TPE algorithm uses a Gaussian mixture model to learn model hyperparameters. On each trial, TPE fits two Gaussian mixture models: one to the best objective values and another to the remaining values. It selects the hyperparameter value that maximizes the ratio of these two models.

**Table 2** TPE parameters
Parameter	Description	Recommended Value
num_samples	Number of hyperparameter groups to search	This integer ranges from 10 to 20. Larger values increase search time but improve results.
n_initial_points	Number of random evaluations of the objective function before using tree-structured parzen estimators	This integer should remain unchanged.
gamma	Quantile used by the TPE algorithm to split l(x) and g(x)	This float value ranges from 0 to 1 and should remain unchanged.

Simulated Annealing

The simulated annealing algorithm is a simple and effective search method that uses the smoothness of the response surface. It starts with a previous trial point and samples each hyperparameter from a distribution similar to the prior, but with a higher concentration around the chosen point. Over time, the algorithm focuses on sampling points closer to the best ones. Occasionally, it may select a runner-up trial as the best to avoid local optima with a certain probability.

**Table 3** Simulated annealing parameters
Parameter	Description	Recommended Value
num_samples	Number of hyperparameter groups to search	This integer ranges from 10 to 20. Larger values increase search time but improve results.
avg_best_idx	Mean of the geometric distribution used to select trials for exploration, based on their scores	This float value should remain unchanged.
shrink_coef	Rate at which the sampling neighborhood size decreases as more points are explored	This float value should remain unchanged.