How Do I Disable the Warmup Function of the Mox?
Symptom
When the TensorFlow version of the training job Mox is running, 50 steps are executed for four times before the job is formally running.
Warmup indicates a process of using a small learning rate to train several epochs first. Network parameters are randomly initialized. If a large learning rate is used at the beginning, the value may be unstable. This is why warmup is used. After the training process is basically stable, the originally set initial learning rate can be used for training.
Possible Cause
There are multiple execution modes for distributed TensorFlow. Mox executes 50 steps for four times to record the execution time, and selects the model with the minimum execution time.
Solution
When creating a training job, add variable_update=parameter_server in Running Parameter to disable the warmup function of Mox.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot