Updated on 2025-11-19 GMT+08:00

Importing Data

Before using ModelArts Studio, you need to prepare OBS buckets and resource pools to support subsequent model tuning, compression, and deployment tasks, as well as storage of model tuning and task log files.

  1. Prepare ModelArts Studio resources. For details, see Preparations.
  2. Prepare a training dataset.

Before importing the NLP pre-training dataset to the platform, you need to preprocess the dataset according to the data format described in Preprocessing Data.

In addition, when you import a dataset from OBS to ModelArts Studio, the size of a single file cannot exceed 50 GB, and the number of files is not limited. For details, see Format Requirements for Text Datasets.

  1. Log in to ModelArts Studio and access the desired workspace.
  2. In the navigation pane, choose Data Engineering > Data Acquisition. On the Import Task page, click Create Import Job in the upper right corner.
  3. On the Create Import Job page, select the dataset type and file format, and set the import source to OBS.
    Figure 1 Create Import Job
  1. Enter the dataset name and description. Enter extended information if required.
    Extended Info includes Dataset Property and Dataset Copyright.
    • Dataset Property: You can add industry, language, and custom information to a dataset.
    • Dataset Copyright: In addition to users' self-built datasets, open-source datasets may be used for model training. The dataset copyright function is used to record and manage the copyright information of datasets to ensure that data is used in compliance with laws and regulations and clearly understand the dataset sources and related copyright authorization. By filling in the information, you can trace the source of the data and specify the restrictions and permissions for using the data, thereby protecting data copyright and avoiding copyright disputes.
  2. Click Create Now in the lower right corner of the page to return to the Import Task page. On the page that is displayed, you can view the task status of the dataset. If the task status is Succeeded, the data is successfully imported.
  3. To view the imported dataset, choose Data Engineering > Data Management > Datasets, and click the Original Dataset tab.