Updated on 2024-06-12 GMT+08:00

Creating a Dataset Import Task

You can import new data from OBS through an OBS path or a manifest file.

dataset.import_data(path=None,  anntation_config=None, **kwargs)

Table 1 lists the import modes supported by datasets.

Table 1 Import modes supported by datasets

Dataset Type

From an OBS Path

From a Manifest File

Remarks

Image classification

Supported

Supported

None

Object detection

Supported

Supported

None

Image segmentation

Supported

Supported

None

Text classification

Supported

Supported

None

Named entity recognition

Not supported

Supported

None

Text triplet

Not supported

Supported

None

Sound classification

Supported

Supported

None

Speech labeling

Not supported

Supported

None

Speech paragraph labeling

Not supported

Supported

None

Table dataset

Supported

Not supported

The schema of the newly imported table data is the same as that of the dataset.

Video labeling

Not supported

Supported

None

Sample Code

  • Example 1: Import an object detection dataset from an OBS path.
    from modelarts.session import Session
    from modelarts.dataset import Dataset
    session = Session()
    
    dataset = Dataset(session, dataset_id)
    annotation_config = dict()
    annotation_config['scene'] = "object_detection"
    annotation_config['format_name'] = "ModelArts PASCAL VOC 1.0"
    import_resp = dataset.import_data(path="/obs-gaia-test/data/image/image-detection/", annotation_config=annotation_config)
  • Example 2: Import an object detection dataset from a manifest file.
    annotation_config = dict()    # Task with data imported from a manifest file. annotation_config is used to import labels.
    import_resp = dataset.import_data(
                path="/obs-gaia-test/data/output/work_path/dataset-5932-Qdd1RUZ3wqBQrwrTr3v/annotation/V001/V001.manifest",annotation_config=annotation_config)
  • Example 3: Import a table dataset from an OBS path.
    import_resp = dataset.import_data(
                path="/obs-gaia-test/data/table/table1/", with_column_header=True)

Parameters

Table 2 Request parameters

Parameter

Mandatory

Type

Description

path

Yes

String

OBS path or manifest file path for importing data

  • If data is to be imported from a manifest file, ensure the manifest file is specified in the path.
  • If data is to be imported from an OBS path, ensure only image classification, object detection, image segmentation, text classification, sound classification, and table datasets are supported.
  • Newline characters (\n), carriage return characters (\r), and tab characters (\t) are not allowed.

annotation_config

No

Table 4

Data labeling format. If this parameter is set to None, no labels will be imported. If data is to be imported from a manifest file, import an empty dict object so that labels can be imported. The following labeling formats are supported:

  • Image classification
  • Object detection
  • Sound classification
  • Text classification

with_column_header

No

Boolean

Whether the first row of a table is the table header. This parameter is mandatory for table datasets.

  • True: The first row of a table is used as the table header.
  • False: The first row of a table is not used as the table header, but only as sample data.