Updated on 2024-04-30 GMT+08:00

Importing Data from an OBS Path

Prerequisites

  • A dataset is available.
  • The data to be imported is stored in OBS. The manifest file is stored in OBS.
  • The OBS bucket and ModelArts are in the same region and you can operate the bucket.

Importing File Data from an OBS Path

The parameters on the GUI for data import vary according to the dataset type. The following uses a dataset of the image classification type as an example.

  1. Log in to the ModelArts management console.. In the navigation pane, choose Data Management > Datasets.
  2. Locate the row that contains the desired dataset and click Import in the Operation column. Alternatively, click the dataset name to go to the Dashboard tab page of the dataset, and click Import in the upper right corner.
  3. In the Import dialog box, configure parameters as follows and click OK.
    • Data Source: OBS
    • Import Mode: Path
    • Import Path: OBS path for storing data
    • Labeling Status: Labeled
    • Advanced Feature Settings: disabled by default

      Import by Tag enables the system to automatically obtain the labels of the current dataset. Click Add Label to add a label. This parameter is optional. If Import by Tag is disabled, you can add or delete labels for imported data when labeling data.

    Figure 1 Importing data from an OBS path

    After the data is imported, it will be automatically synchronized to the dataset. On the Datasets page, click the dataset name to view its details and create a labeling job to label the data.

Labeling Status of File Data

The labeling status can be Unlabeled or Labeled.

  • Unlabeled: Only the labeling object (such as unlabeled images or texts) is imported.
  • Labeled: Both the labeling object and content are imported. Labeling content importing is not supported for datasets in free format.

    To ensure that the labeling content can be correctly read, you must store data in strict accordance with the specifications.

    If Import Mode is set to Path, store the data to be imported according to the labeling file specifications. For details, see Specifications for Importing Data from an OBS Directory.

    If Import Mode is set to manifest, the manifest file specifications must be met.

    • If the labeling status is set to Labeled, ensure that the folder or manifest file complies with the format specifications. Otherwise, the import may fail.
    • After the import of labeled data, check whether the imported data is in the labeled state.

Importing a Table Dataset from OBS

ModelArts allows you to import table data (CSV files) from OBS.

Import description:

  • The prerequisite for successful import is that the schema of the data source must be the same as that specified during dataset creation. The schema indicates column names and types of a table. Once specified during dataset creation, the values cannot be changed.
  • When a CSV file is imported from OBS, the data type is not validated, but the number of columns must be the same as that in the schema of the dataset. If the data format is invalid, the data is set to null. For details, see Table 3.
  • You must select the directory where the CSV file is stored. The number of columns in the CSV file must be the same as that in the dataset schema. The schema of the CSV file can be automatically obtained.
├─dataset-import-example 
│      table_import_1.csv 
│      table_import_2.csv
│      table_import_3.csv
│      table_import_4.csv