Using Data Engineering to Build a DeepSeek Model Dataset

Table 1 describes how to use data engineering to build a third-party model dataset on ModelArts Studio.

**Table 1** Process of building a third-party model dataset
Process	Sub-process	Description	Operation Guide
Importing data to the Pangu platform	Creating an import task	Import data stored in OBS or local data into the platform for centralized management, facilitating subsequent processing or publishing. NOTE: When importing a dataset, set the dataset type to Single Round QA.	Importing Data to the Pangu Platform
Processing other datasets	Processing other datasets	Use custom processing operators to preprocess data, ensuring it meets the model training standards and service requirements.	Processing Other Datasets
Publishing other datasets	Publishing other datasets	Data publishing refers to publishing a single dataset in a specific format as a published dataset for subsequent model training operations.	Publishing Other Datasets