Updated on 2024-05-30 GMT+08:00

Creating an Import Task

Function

This API is used to create a dataset import task to import samples and labels from the storage system to the dataset.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

POST /v2/{project_id}/datasets/{dataset_id}/import-tasks

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

dataset_id

Yes

String

Dataset ID.

project_id

Yes

String

Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

Request Parameters

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

data_source

No

DataSource object

Data source.

difficult_only

No

Boolean

Whether to import only hard examples. Options:

  • true: Only difficult samples are imported.

  • false: All samples are imported. (Default value)

excluded_labels

No

Array of Label objects

Do not import samples containing the specified label.

final_annotation

No

Boolean

Whether to import data to the final state. Options:

  • true: Import data to the final state. (Default value)

  • false: Do not import data to the final state.

import_annotations

No

Boolean

Whether to import labels. Options:

  • true: Import labels. (Default value)

  • false: Do not import labels.

import_folder

No

String

Name of the subdirectory in the dataset storage directory after import. You can specify the same subdirectory for multiple import tasks to avoid repeated import of the same samples. This field is invalid for table datasets.

import_origin

No

String

Data source. Options:

  • obs: OBS bucket (default value)

  • dws: GaussDB(DWS)

  • dli: DLI

  • rds: RDS

  • mrs: MRS

  • inference: Inference service

import_path

Yes

String

OBS path or manifest path to be imported.

  • When importing a manifest file, ensure that the path is accurate to the manifest file.

  • When a path is imported as a directory, the dataset type can only support image classification, object detection, text classification, or sound classification.

import_samples

No

Boolean

Whether to import samples. Options:

  • true: Import samples. (Default value)

  • false: Do not import samples.

import_type

No

String

Import mode. Options:

  • dir: Import datasets through an OBS path.

  • manifest: Import datasets through a manifest file.

included_labels

No

Array of Label objects

Import samples containing the specified label.

label_format

No

LabelFormat object

Label format. This parameter is used only for text datasets.

with_column_header

No

Boolean

Whether the first row in the file is a column name. This field is valid for the table dataset. Options:

  • true: The first row in the file is the column name.

  • false: The first row in the file is not the column name. (Default value)

Table 3 DataSource

Parameter

Mandatory

Type

Description

data_path

No

String

Data source path.

data_type

No

Integer

Data type. Options:

  • 0: OBS bucket (default value)

  • 1: GaussDB(DWS)

  • 2: DLI

  • 3: RDS

  • 4: MRS

  • 5: AI Gallery

  • 6: Inference service

schema_maps

No

Array of SchemaMap objects

Schema mapping information corresponding to the table data.

source_info

No

SourceInfo object

Information required for importing a table data source.

with_column_header

No

Boolean

Whether the first row in the file is a column name. This field is valid for the table dataset. Options:

  • true: The first row in the file is the column name.

  • false: The first row in the file is not the column name.

Table 4 SchemaMap

Parameter

Mandatory

Type

Description

dest_name

No

String

Name of the destination column.

src_name

No

String

Name of the source column.

Table 5 SourceInfo

Parameter

Mandatory

Type

Description

cluster_id

No

String

MRS cluster ID. You can log in to the MRS console to view the information.

cluster_mode

No

String

Running mode of an MRS cluster. Options:

  • 0: normal cluster

  • 1: security cluster

cluster_name

No

String

MRS cluster name You can log in to the MRS console to view the information.

database_name

No

String

Name of the database to which the table dataset is imported.

input

No

String

HDFS path of the table data set. For example, /datasets/demo.

ip

No

String

IP address of your GaussDB(DWS) cluster.

port

No

String

Port number of your GaussDB(DWS) cluster.

queue_name

No

String

DLI queue name of a table dataset.

subnet_id

No

String

Subnet ID of an MRS cluster.

table_name

No

String

Name of the table to which a table dataset is imported.

user_name

No

String

Username, which is mandatory for GaussDB(DWS) data.

user_password

No

String

User password, which is mandatory for GaussDB(DWS) data.

vpc_id

No

String

ID of the VPC where an MRS cluster resides.

Table 6 Label

Parameter

Mandatory

Type

Description

attributes

No

Array of LabelAttribute objects

Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included.

name

No

String

Label name.

property

No

LabelProperty object

Basic attribute key-value pair of a label, such as color and shortcut keys.

type

No

Integer

Label type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video labeling

Table 7 LabelAttribute

Parameter

Mandatory

Type

Description

default_value

No

String

Default value of a label attribute.

id

No

String

Label attribute ID. You can query the tag by invoking the tag list.

name

No

String

Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'.

type

No

String

Label attribute type. Options:

  • text: text

  • select: single-choice drop-down list

values

No

Array of LabelAttributeValue objects

List of label attribute values.

Table 8 LabelAttributeValue

Parameter

Mandatory

Type

Description

id

No

String

Label attribute value ID.

value

No

String

Label attribute value.

Table 9 LabelProperty

Parameter

Mandatory

Type

Description

@modelarts:color

No

String

Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0.

@modelarts:default_shape

No

String

Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options:

  • bndbox: rectangle

  • polygon: polygon

  • circle: circle

  • line: straight line

  • dashed: dotted line

  • point: point

  • polyline: polyline

@modelarts:from_type

No

String

Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.

@modelarts:rename_to

No

String

Default attribute: The new name of the label.

@modelarts:shortcut

No

String

Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D.

@modelarts:to_type

No

String

Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.

Table 10 LabelFormat

Parameter

Mandatory

Type

Description

label_type

No

String

Label type of text classification. Options:

  • 0: The label is separated from the text, and they are distinguished by the fixed suffix _result. For example, the text file is abc.txt, and the label file is abc_result.txt.

  • 1: Default value. Labels and texts are stored in the same file and separated by separators. You can use text_sample_separator to specify the separator between the text and label and text_label_separator to specify the separator between labels.

text_label_separator

No

String

Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;,

text_sample_separator

No

String

Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;,

Response Parameters

Status code: 200

Table 11 Response body parameters

Parameter

Type

Description

task_id

String

ID of an import task.

Example Requests

  • Creating an Import Task (Importing Data from OBS)

    {
      "import_type" : "dir",
      "import_path" : "s3://test-obs/daoLu_images/animals/",
      "included_labels" : [ ],
      "import_annotations" : false,
      "difficult_only" : false
    }
  • Creating an Import Task (Importing Data from Manifest)

    {
      "import_type" : "manifest",
      "import_path" : "s3://test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/V002/V002.manifest",
      "included_labels" : [ "rabbits", "bees", "Rabbits", "Bees" ],
      "import_annotations" : true,
      "difficult_only" : false
    }

Example Responses

Status code: 200

OK

{
  "task_id" : "gfghHSokody6AJigS5A_m1dYqOw8vWCAznw1V28"
}

Status Codes

Status Code

Description

200

OK

401

Unauthorized

403

Forbidden

404

Not Found

Error Codes

See Error Codes.