Help Center/ PanguLargeModels/ User Guide/ Developing a DeepSeek Model/ Using Data Engineering to Build a DeepSeek Model Dataset
Updated on 2025-11-04 GMT+08:00

Using Data Engineering to Build a DeepSeek Model Dataset

Process of Building a DeepSeek Model Dataset

Table 1 describes how to use data engineering to build a third-party model dataset on ModelArts Studio.

Table 1 Process of building a third-party model dataset

Process

Sub-process

Description

Operation Guide

Importing data to the Pangu platform

Creating an import task

Import data stored in OBS or local data into the platform for centralized management, facilitating subsequent processing or publishing.

NOTE:

When importing a dataset, set the dataset type to Single Round QA.

Importing Data to the Pangu Platform

Processing other datasets

Processing other datasets

Use custom processing operators to preprocess data, ensuring it meets the model training standards and service requirements.

Processing Other Datasets

Publishing other datasets

Publishing other datasets

Data publishing refers to publishing a single dataset in a specific format as a published dataset for subsequent model training operations.

Publishing Other Datasets