Updated on 2025-07-02 GMT+08:00

Importing Data to the Pangu Platform

A dataset is a set of related data samples used for processing and analysis.

After data stored in OBS or local data is imported to ModelArts Studio, an original dataset is generated and managed by ModelArts Studio for subsequent processing or publishing.

Creating an Import Job

Before creating an import job, prepare data based on Dataset Format Requirements.

You can use OBS to import data. For details, see Using OBS Console.

To create an import job, do as follows:

  1. Log in to ModelArts Studio Large Model Deveopment Platform. In the My Spaces area, click the required workspace.
    Figure 1 My Spaces
  2. In the navigation pane, choose Data Engineering > Data Acquisition. On the Import Task page, click Create Import Job in the upper right corner.
  3. On the Create Import Job page, select the dataset type, file format, and import source.

  4. Set Import Source to OBS and click . In the Storage Location dialog box, select the data to be imported and click OK.
    Figure 2 Selecting data to be imported

    Set Import Source to Local File and click Add. Select the file to be imported.

    Figure 2 Local File

  5. Enter the dataset name and description. Enter extended information if required.
    Extended Info includes Dataset Property and Dataset Copyright.
    • Dataset Property: You can add industry, language, and custom information to a dataset.
    • Dataset Copyright: In addition to users' self-built datasets, open-source datasets may be used for model training. The dataset copyright function is used to record and manage the copyright information of datasets to ensure that data is used in compliance with laws and regulations and clearly understand the dataset sources and related copyright authorization. By filling in the information, you can trace the source of the data and specify the restrictions and permissions for using the data, thereby protecting data copyright and avoiding copyright disputes.
  6. Click Create Now in the lower right corner of the page to return to the Import Task page. On the page that is displayed, you can view the task status of the dataset. If the task status is Succeeded, the data is successfully imported.
  7. To view the imported dataset, choose Data Engineering > Data Management > Datasets, and click the Original Dataset tab.
    If the task status is Failed, the import has failed. The possible causes are as follows:
    • The file name extension is incorrect. Check whether the file name extension is correct. For example, if you create a dataset in CSV format, the file name extension must be .csv.
    • The file content fails to be verified. Check whether the format of the uploaded file is correct. You can download data samples on the Create Import Job page for comparison.

Managing Original Datasets

After data is imported, you can manage the original datasets in a unified manner. You can view the basic information, data lineage, and operation records of the dataset, and download and delete the dataset.

  1. Log in to ModelArts Studio and access a workspace.
  2. In the navigation pane, choose Data Engineering > Data Management > Datasets. On the Original Dataset tab page, click the name of the dataset to be viewed.
    • Enter the basic information about the dataset. On the Basic Information tab page, you can view the data details, data source, and extended information.
    • In the Extended Info area, you can set dataset properties as required, including the dataset property name, industry, language, and custom tag.
    • Download the original dataset. On the Data Preview tab page, you can view the data content and click Download in the upper right corner to download the original dataset.
    • View data lineages. On the Data Lineage tab page, you can view the complete operations performed on the current dataset, such as processing and labeling.
    • View operation records. On the Operation Record tab page, you can view the operation records of the current dataset, such as the creation time, status, and operator of the dataset.
  3. Delete the original dataset. Click Delete in the Operation column. In the displayed dialog box, click OK.

    Deleting an original dataset is a high-risk operation. Before deleting a dataset, ensure that it is no longer used.