MindSpeed-RL
This section describes the YAML configuration file and parameters for training. You can choose parameters as required.
Configuring Parameters in the YAML File
Modify the YAML file.

- Choose either of the following dataset parameters.
Parameter
Example Value
Description
backend_config.preprocess_data.input
Relative or absolute address of the dataset
Input data path specified during training. Change it based on the actual situation. Select either of the following options.
backend_config.megatron_training.data_path
/home/ma-user/ws/xxx
Directory of the processed data. If the data has been processed, set this parameter.
- Set the training scenario, weight file, output directory, and other important parameters. The details are as follows.
Parameter
Example Value
Description
backend_config.megatron_training.tokenizer_name_or_path
/home/ma-user/ws/llm_train/AscendFactory/model/llama2-70B
(Mandatory) Path for storing the tokenizer and Hugging Face weight files. Change it based on the actual situation.
af_output_dir
/home/ma-user/ws/save_dir
(Mandatory) Directory for storing logs and weight files generated after the training is complete.
backend_config.preprocess_data.handler_name
- GeneralPretrainHandler
- GeneralInstructionHandler
- MOSSInstructionHandler
- AlpacaStyleInstructionHandler
- SharegptStyleInstructionHandler
(Mandatory) Select a value based on the ${dataset}.
- GeneralPretrainHandler: Use the pre-trained Alpaca dataset.
- GeneralInstructionHandler: Use the fine-tuned Alpaca dataset.
- MOSSInstructionHandler: Use the fine-tuned MOSS dataset.
- AlpacaStyleInstructionHandler: Use the fine-tuned Alpaca dataset.
- SharegptStyleInstructionHandler: Use the Sharegpt dataset.
backend_config.actor_config.no_load_optim
backend_config.actor_config.no_load_rng
false
Specifies whether to load the optimizer status.
- false: The optimizer status is not loaded.
- true: The optimizer status is loaded.
- Set other parameters.
Parameter
Example Value
Description
backend_config.actor_config.micro_batch_size
1
Number of samples processed by a micro batch in pipeline parallelism. In pipeline parallelism, data of a step is divided into multiple micro batches to reduce the bubble time.
The value is related to tensor-model-parallel-size, pipeline-model-parallel-size, and the model size. You can adjust the value based on the site requirements. The value is referred to as MBS.
backend_config.megatron_training.global_batch_size
128
Number of samples processed by all servers in a step during training, which affects the training iteration time. Short name: GBS.
backend_config.actor_config.tensor_model_parallel_size
8
Tensor parallelism, referred to as TP.
backend_config.actor_config.pipeline_model_parallel_size
4
Pipeline parallelism. Generally, the value of this parameter is the number of training nodes, which is the same as the value configured during weight conversion. Short name: PP.
backend_config.actor_config.lr
2.5e-5
Learning rate.
backend_config.actor_config.min_lr
2.5e-6
Minimum learning rate.
backend_config.megatron_training.train_iters
10
Number of training iterations, with a default value. This parameter is optional.
backend_config.megatron_training.save_interval
1000
Model saving interval.
- If the parameter value is greater than or equal to TRAIN_ITERS, only the last version of the model that has been trained for TRAIN_ITERS times is saved.
- If the parameter value is less than TRAIN_ITERS, the model version is saved every SAVE_INTERVAL times.
Number of saved model versions = TRAIN_ITERS/SAVE_INTERVAL + 1
backend_config.actor_config.load
null
Weight loading path. By default, the weights are loaded. If the value is null, the weights are not loaded.
For other parameters not mentioned, see the parameter configuration.
Model parameter constraints
- Tensor parallelism, pipeline parallelism, and context parallelism: The value of TP x PP x CP must be exactly divisible by the number of NPUs (word_size).
- The value of TP x CP must be exactly divisible by num_attention_heads in the model parameters.
- MBS (micro-batch-size) and GBS (global-batch-size): The values of GBS and MBS must be exactly divisible by NPU/(TP x PP x CP).
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot