Overview
ModelArts automatically searches for optimal hyperparameters for your models, saving time and effort.
During training, hyperparameters like learning_rate and weight_decay need to be adjusted. ModelArts hyperparameter search optimizes these settings automatically, outperforming manual tuning in speed and precision.
ModelArts supports the following hyperparameter search algorithms:
- Bayesian Optimization (SMAC)
- Tree-structured Parzen Estimator (TPE)
- Simulated Annealing
Bayesian Optimization (SMAC)
Bayesian optimization assumes a functional relationship between hyperparameters and the objective function. It estimates the mean and variance of objective function values at other search points using Gaussian process regression, based on the evaluation values of the searched hyperparameters. The mean and variance are then used to construct the acquisition function, which identifies the next search point as its maximum value. Bayesian optimization reduces the number of iterations and search time by leveraging previous evaluation results, but it can struggle to find the global optimal solution.
Parameter |
Description |
Recommended Value |
---|---|---|
num_samples |
Number of hyperparameter groups to search |
This integer ranges from 10 to 20. Larger values increase search time but improve results. |
kind |
Acquisition function type |
This string defaults to ucb. Other options are ei and poi, but it is best to stick with the default. |
kappa |
Adjustment parameter for the ucb acquisition function, representing the upper confidence boundary |
This float value should remain unchanged. |
xi |
Adjustment parameter for poi and ei acquisition functions. |
This float value should remain unchanged. |
Tree-structured Parzen Estimator (TPE)
The TPE algorithm uses a Gaussian mixture model to learn model hyperparameters. On each trial, TPE fits two Gaussian mixture models: one to the best objective values and another to the remaining values. It selects the hyperparameter value that maximizes the ratio of these two models.
Parameter |
Description |
Recommended Value |
---|---|---|
num_samples |
Number of hyperparameter groups to search |
This integer ranges from 10 to 20. Larger values increase search time but improve results. |
n_initial_points |
Number of random evaluations of the objective function before using tree-structured parzen estimators |
This integer should remain unchanged. |
gamma |
Quantile used by the TPE algorithm to split l(x) and g(x) |
This float value ranges from 0 to 1 and should remain unchanged. |
Simulated Annealing
The simulated annealing algorithm is a simple and effective search method that uses the smoothness of the response surface. It starts with a previous trial point and samples each hyperparameter from a distribution similar to the prior, but with a higher concentration around the chosen point. Over time, the algorithm focuses on sampling points closer to the best ones. Occasionally, it may select a runner-up trial as the best to avoid local optima with a certain probability.
Parameter |
Description |
Recommended Value |
---|---|---|
num_samples |
Number of hyperparameter groups to search |
This integer ranges from 10 to 20. Larger values increase search time but improve results. |
avg_best_idx |
Mean of the geometric distribution used to select trials for exploration, based on their scores |
This float value should remain unchanged. |
shrink_coef |
Rate at which the sampling neighborhood size decreases as more points are explored |
This float value should remain unchanged. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot