Document |
txt, mobi, epub, docx, and pdf |
Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Local upload: The size of a single file cannot exceed 10 MB, and the number of files cannot exceed 100. |
Web page |
html |
Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Local upload: The size of a single file cannot exceed 10 MB, and the number of files cannot exceed 100. |
Pre-trained text |
jsonl |
- JSONL format: text indicates the text data used for pre-training. The following is an example:
{"text":"Pangu Models are Pangu series AI models launched by Huawei, including the NLP model, multimodal model, CV model, scientific computing model, and prediction model."}
- Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Local upload: The size of a single file cannot exceed 10 MB, and the number of files cannot exceed 100.
|
Single-turn Q&A |
jsonl and csv |
- JSONL format: The data consists of Q&A pairs. context and target indicate the question and answer, respectively. The following is an example:
{"context": "Hello, please introduce yourself.","target": "I am a Pangu model."}
- CSV format: The first column in the CSV file corresponds to context, and the second column corresponds to target. The following is an example:
"Hello, please introduce yourself.","I am a Pangu model."
- Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Local upload: The size of a single file cannot exceed 10 MB, and the number of files cannot exceed 100.
|
Single-turn Q&A (with a system persona) |
jsonl and csv |
- JSONL format: system indicates the persona, context indicates the question, and target indicates the answer.
{"system": "You're a smart and humorous Q&A assistant.","context": "Hello, please introduce yourself.","target":"Hello. I'm your smart assistant."}
- CSV format: In the CSV file, the first column corresponds to system, and the second and third columns correspond to context and target, respectively.
"You're a smart and humorous Q&A assistant.","Hello, please introduce yourself.","Hello. I'm your smart assistant."
- Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Local upload: The size of a single file cannot exceed 10 MB, and the number of files cannot exceed 100.
|
Multi-turn Q&A |
jsonl |
- JSONL format: an array consisting of at least one Q&A pair. The format is [{"context":"context content 1","target":"target content 1"},{"context":"context content 2","target":"target content 2"}]. context and target indicate the question and answer, respectively.
[{"context":"Hello","target":"Hello, what can I do for you?"},{"context":"Please introduce Huawei Cloud products.","target":"Huawei Cloud products include but are not limited to compute, storage, and network products."}]
- Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Local upload: The size of a single file cannot exceed 10 MB, and the number of files cannot exceed 100.
|
Multi-turn Q&A (with a system persona) |
jsonl |
- JSONL format: an array consisting of at least one Q&A pair. system indicates the persona, context indicates the question, and target indicates the answer.
[{"system": "You are a book recommendation expert."},{"context":"Hi","target":"Hi. What can I do for you?"},{"context":"Can you recommend some books to me?","target":"Of course. Based on your interest, I recommend you the Future of Autonomous Driving."}]
- Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Local upload: The size of a single file cannot exceed 10 MB, and the number of files cannot exceed 100.
|
Q&A ranking |
jsonl and csv |
- JSONL format: context indicates the question. The order of targets 1, 2, and 3 represent the order of human-preferred answers. The most preferred answer is placed at the forefront.
{"context":"context content ","targets":["Answer 1,""Answer 2,""Answer 3"]}
- CSV format: The first column in the CSV file corresponds to context, and the other columns are answers.
"Question,""Answer 1","Answer 2,","Answer 3"
- Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Local upload: The size of a single file cannot exceed 10 MB, and the number of files cannot exceed 100.
|
Direct Preference Optimization (DPO) |
jsonl |
- JSONL format: context indicates the question, target indicates the expected correct answer, and bad_target indicates an incorrect or unexpected answer.
Single-turn Q&A
{"context": ["Hello, please introduce yourself."],"target":"I'm a Pangu model.","bad_target":"Sorry, I can't assist with that."}
Multi-turn Q&A
{"context": ["Hello, please introduce yourself.", "I'm a Pangu model.", "Please introduce Huawei Cloud products."],"target":"Huawei Cloud products include but are not limited to compute, storage, and network products."}
- Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Local upload: The size of a single file cannot exceed 10 MB, and the number of files cannot exceed 100.
|
DPO (with a system persona) |
jsonl |
- JSONL format: system indicates the persona, context indicates the question, target indicates the expected correct answer, and bad_target indicates an incorrect or unexpected answer.
Single-turn Q&A (with a system persona)
{"system":"You are a humorous Q&A assistant.","context": ["Hello, please introduce yourself."],"target":"Hello. I am your smart assistant. How can I help you?","bad_target":"Sorry, I can't assist with that."}
Multi-turn Q&A (with a system persona)
{"system":"You are a humorous Q&A assistant.","context": ["Hello, please introduce yourself.", "Hello, I am your smart assistant. How can I help you?", "Please introduce Huawei Cloud products."], "target":"Huawei Cloud provides compute, storage, and network products, as well as many other products.","bad_target":"Sorry, I can't assist with that."
- Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Local upload: The size of a single file cannot exceed 10 MB, and the number of files cannot exceed 100.
|