准备模型权重与训练数据
准备模型权重文件
本文档以Qwen系列模型为例介绍训练过程,表1 支持的大语言模型列表和权重获取地址中介绍了Qwen系列模型的权重获取地址。
RLHF(Reinforcement Learning from Human Feedback,人类反馈强化学习),通过将人类的反馈纳入训练过程,为机器提供了一种自然的、人性化的互动学习过程。
访问权重文件下载网站Huggingface时,需要配置代理,请在互联网查询解决方案。
下载好的模型权重文件,请上传至OBS桶中。基于OBS规划,OBS桶中文件存放目录示例如下:
obs://verl/verl-a2/models/Qwen3-8B obs://verl/verl-a2/models/Qwen2.5-VL-32b-Instruct
准备训练数据和数据预处理脚本(Qwen3-8b)
Qwen3-8b模型训练使用gsm8k数据。
- 下载gsm8k数据,下载地址:https://huggingface.co/datasets/openai/gsm8k/tree/main
- 参照下文提供的文件示例,准备数据预处理脚本gsm8k.py文件。在执行训练任务时训练脚本run_train_8b.sh会调用gsm8k.py文件预处理数据。
- 将gsm8k数据和gsm8k.py数据预处理脚本上传至OBS桶。OBS桶中文件存放目录示例如下:
obs://verl/verl-a2/dataset/gsm8k obs://verl/verl-a2/gsm8k.py
gsm8k.py文件内容如下:
训练时,数据在训练容器中的存放路径为"/home/ma-user/work/verl-a2/dataset/gsm8k",用户可以自定义修改。其中/home/ma-user/work是训练容器中的代码目录,需要和创建训练作业时设置的本地代码目录保持一致;verl-a2/dataset/gsm8k是OBS桶中文件存放路径,需要和实际保持一致。
import argparse import os import re import datasets from verl.utils.hdfs_io import copy, makedirs def extract_solution(solution_str): solution = re.search("#### (\\-?[0-9\\.\\,]+)", solution_str) assert solution is not None final_solution = solution.group(0) final_solution = final_solution.split("#### ")[1].replace(",", "") return final_solution if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--local_dir", default="~/data/gsm8k") parser.add_argument("--hdfs_dir", default=None) args = parser.parse_args() data_source = "openai/gsm8k" dataset = datasets.load_dataset("/home/ma-user/work/verl-a2/dataset/gsm8k", "main") # "/home/ma-user/work/verl-a2/dataset/gsm8k"是数据集存放路径,用户可以修改 train_dataset = dataset["train"] test_dataset = dataset["test"] instruction_following = 'Let\'s think step by step and output the final answer after "####".' # add a row to each data item that represents a unique id def make_map_fn(split): def process_fn(example, idx): question_raw = example.pop("question") question = question_raw + " " + instruction_following answer_raw = example.pop("answer") solution = extract_solution(answer_raw) data = { "data_source": data_source, "prompt": [ { "role": "user", "content": question, } ], "ability": "math", "reward_model": {"style": "rule", "ground_truth": solution}, "extra_info": { "split": split, "index": idx, "answer": answer_raw, "question": question_raw, }, } return data return process_fn train_dataset = train_dataset.map(function=make_map_fn("train"), with_indices=True) test_dataset = test_dataset.map(function=make_map_fn("test"), with_indices=True) local_dir = args.local_dir hdfs_dir = args.hdfs_dir train_dataset.to_parquet(os.path.join(local_dir, "train.parquet")) test_dataset.to_parquet(os.path.join(local_dir, "test.parquet")) if hdfs_dir is not None: makedirs(hdfs_dir) copy(src=local_dir, dst=hdfs_dir)
准备训练数据和数据预处理脚本(Qwen2.5-VL-32b-Instruct)
Qwen2.5-VL-32b-Instruct模型训练使用geometry3k数据。
- 下载geometry3k数据,下载地址:https://huggingface.co/datasets/hiyouga/geometry3k/tree/main
- 参照下文提供的文件示例,准备数据预处理脚本geometry3k.py文件。在执行训练任务时训练脚本run_train_32b.sh会调用geometry3k.py文件预处理数据。
- 将geometry3k数据和geometry3k.py数据预处理脚本上传至OBS桶。OBS桶中文件存放目录示例如下:
obs://verl/verl-a2/dataset/geometry3k obs://verl/verl-a2/geometry3k.py
geometry3k.py文件内容如下:
训练时,数据在训练容器中的存放路径为"/home/ma-user/work/verl-a2/dataset/geometry3k",用户可以自定义修改。其中/home/ma-user/work是训练容器中的代码目录,需要和创建训练作业时设置的本地代码目录保持一致;verl-a2/dataset/geometry3k是OBS桶中文件存放路径,需要和实际保持一致。
import argparse import os import datasets from verl.utils.hdfs_io import copy, makedirs if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--local_dir", default="~/data/geo3k") parser.add_argument("--hdfs_dir", default=None) args = parser.parse_args() data_source = "hiyouga/geometry3k" dataset = datasets.load_dataset("/home/ma-user/work/verl-a2/dataset/geometry3k") # "/home/ma-user/work/verl-a2/dataset/geometry3k"是数据集路径,用户可以修改 train_dataset = dataset["train"] test_dataset = dataset["test"] instruction_following = ( r"You FIRST think about the reasoning process as an internal monologue and then provide the final answer. " r"The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \boxed{}." ) # add a row to each data item that represents a unique id def make_map_fn(split): def process_fn(example, idx): problem = example.pop("problem") prompt = problem + " " + instruction_following answer = example.pop("answer") images = example.pop("images") data = { "data_source": data_source, "prompt": [ { "role": "user", "content": prompt, } ], "images": images, "ability": "math", "reward_model": {"style": "rule", "ground_truth": answer}, "extra_info": { "split": split, "index": idx, "answer": answer, "question": problem, }, } return data return process_fn train_dataset = train_dataset.map(function=make_map_fn("train"), with_indices=True, num_proc=8) test_dataset = test_dataset.map(function=make_map_fn("test"), with_indices=True, num_proc=8) local_dir = args.local_dir hdfs_dir = args.hdfs_dir train_dataset.to_parquet(os.path.join(local_dir, "train.parquet")) test_dataset.to_parquet(os.path.join(local_dir, "test.parquet")) if hdfs_dir is not None: makedirs(hdfs_dir) copy(src=local_dir, dst=hdfs_dir)