VeRL
This section describes the YAML configuration file and parameters for training. You can choose parameters as required.
YAML File Configuration
Edit the YAML file using these instructions. Parameters like aaa.bbb show the value of bbb under aaa. For example, backend_config.data.train_files refers to the data.train_files parameter of backend_config.

|
Parameter |
Example Value |
Description |
|---|---|---|
|
af_output_dir |
/home/ma-user/verl |
(Mandatory) Training output result |
|
backend_config.data.train_files |
/data/geometry3k/train.parquet |
(Mandatory) Training set after preprocessing |
|
backend_config.data.val_files |
/data/geometry3k/test.parquet |
(Mandatory) Validation set after preprocessing |
|
backend_config.actor_rollout_ref.model.path |
/model/Qwen2.5-VL-32B-Instruct |
(Mandatory) Hugging Face model path, which can be a local path or an HDFS path. |
|
backend_config.data.train_batch_size |
32 |
Batch size for one training sampling. |
|
backend_config.actor_rollout_ref.actor.ppo_mini_batch_size |
8 |
Batch size split into multiple sub-batches. |
|
backend_config.actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu |
1 |
Data volume for a single forward propagation when training the Actor on one device. |
|
backend_config.actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu |
1 |
Data volume a single device processes in one forward propagation while calculating log probabilities during rollout. |
|
backend_config.actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu |
1 |
Data volume each devic processes during one forward propagation for calculating the reference policy's log probabilities. |
|
backend_config.trainer.total_epochs |
5 |
(Optional) Number of training epochs, which you can set depending on your needs. |
|
backend_config.actor_rollout_ref.rollout.tensor_model_parallel_size |
4 |
Model segmentation during vLLM inference. |
|
backend_config.data.image_key |
images |
(Multimodal model) Field where images in the dataset are located. The default value is 'images'. |
|
engine_kwargs.vllm.disable_mm_preprocessor_cache |
True |
(Multimodal model) Specifies whether to disable the preprocessor cache of multimodal models. The default value is False. |
|
backend_config.data.max_prompt_length |
1024 |
Maximum prompt length. All prompts will be left-padded to this length. |
|
backend_config.data.max_response_length |
1024 |
Maximum length of responses Maximum generation length during the rollout phase in the RL algorithm. |
|
backend_config.actor_rollout_ref.rollout.max_num_batched_tokens |
18432 |
When max_response_length + max_prompt_length is greater than 8k, set this parameter to the sum of the two parameters. The default value is 8192. |
|
backend_config.actor_rollout_ref.actor.ulysses_sequence_parallel_size |
1 |
Sequence parallelism. This parameter is used for long sequences. The default value is 1. When the value of max_response_length is greater than 8k, the value is generally rounded off (max_response_length/2048). |
|
backend_config.data.shuffle |
True |
Specifies whether to shuffle the data in dataloader. |
|
backend_config.data.truncation |
'error' |
Truncates input_ids or prompt length if they exceed max_prompt_length. The default value is 'error', which means they must exceed max_prompt_length. |
|
backend_config.actor_rollout_ref.actor.optim.lr |
1e-6 |
Actor learning rate |
|
backend_config.actor_rollout_ref.model.use_remove_padding |
True |
Specifies whether to remove padding.
|
|
backend_config.actor_rollout_ref.actor.use_kl_loss |
True |
Specifies whether to use KL loss in the actor. If yes, KL is not used in the reward function. The default value is True. |
|
backend_config.actor_rollout_ref.actor.kl_loss_coef |
0.01 |
KL loss coefficient. The default value is 0.001. |
|
backend_config.actor_rollout_ref.actor.kl_loss_type |
low_var_kl |
Specifies how to calculate the KL divergence between the participant and the reference policy. The default value is low_var_kl. |
|
backend_config.actor_rollout_ref.actor.entropy_coeff |
0 |
Calculates the PPO loss. Generally, set this parameter to 0. |
|
backend_config.actor_rollout_ref.actor.use_torch_compile |
False |
Specifies whether to enable JIT compilation acceleration. The value can be False. |
|
backend_config.actor_rollout_ref.model.enable_gradient_checkpointing |
True |
Specifies whether to enable gradient checkpointing for the actor. |
|
backend_config.actor_rollout_ref.rollout.name |
vllm |
Inference framework name vllm. |
|
backend_config.actor_rollout_ref.rollout.gpu_memory_utilization |
0.4 |
Ratio of the total device memory to the vLLM instances. |
|
backend_config.actor_rollout_ref.rollout.n |
4 |
Repeats a batch of data for n times (interleaving is performed during the repetition). |
|
backend_config.trainer.logger |
['console','tensorboard'] |
Log backend. The options are wandb, console, and tensorboard. |
|
backend_config.trainer.val_before_train |
False |
Specifies whether to run the validation set test before training.
|
|
backend_config.trainer.resume_mode |
auto |
The default value is auto. Set this parameter to auto to load the most recent training weights from the default save location if training stops unexpectedly. For resumable training, change the parameter to resume_path and specify resume_from_path. Set it to disable to disable resumable training. |
|
backend_config.trainer.resume_from_path |
null |
The default value is null. Set this parameter to specify the path to load weight parameters for resumable training. |
|
backend_config.trainer.save_freq |
-1 |
Frequency of saving model weight parameters (by iteration). The default value is -1, indicating that weight parameters are not saved. |
|
backend_config.actor_rollout_ref.actor.ppo_max_token_len_per_gpu |
2048 |
Maximum number of tokens that can be processed by a single GPU in a PPO micro batch size. Generally, set this parameter to n x (data.max_prompt_length + data.max_response_length). |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot