Updated on 2024-05-30 GMT+08:00

Adding Samples in Batches

Function

This API is used to add samples in batches.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

POST /v2/{project_id}/datasets/{dataset_id}/data-annotations/samples

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

dataset_id

Yes

String

Dataset ID.

project_id

Yes

String

Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

Request Parameters

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

final_annotation

No

Boolean

Whether to directly import to the final result. Options:

  • true: Import labels to the labeled dataset. (Default value).

  • false: Import labels to the to-be-confirmed dataset. Currently, to-be-confirmed datasets only support categories of image classification and object detection.

label_format

No

LabelFormat object

Label format. This parameter is used only for text datasets.

samples

No

Array of Sample objects

Sample list.

Table 3 LabelFormat

Parameter

Mandatory

Type

Description

label_type

No

String

Label type of text classification. Options:

  • 0: The label is separated from the text, and they are distinguished by the fixed suffix _result. For example, the text file is abc.txt, and the label file is abc_result.txt.

  • 1: Default value. Labels and texts are stored in the same file and separated by separators. You can use text_sample_separator to specify the separator between the text and label and text_label_separator to specify the separator between labels.

text_label_separator

No

String

Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;,

text_sample_separator

No

String

Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=|?/':.;,

Table 4 Sample

Parameter

Mandatory

Type

Description

data

No

Object

Byte data of sample files. The type is java.nio.ByteBuffer. When this parameter is called, the string converted from the byte data is uploaded.

data_source

No

DataSource object

Data source.

encoding

No

String

Encoding type of sample files, which is used to upload .txt or .csv files. The value can be UTF-8, GBK, or GB2312. The default value is UTF-8.

labels

No

Array of SampleLabel objects

Sample label list.

metadata

No

SampleMetadata object

Key-value pair of the sample metadata attribute.

name

No

String

Name of sample files. The value contains 0 to 1,024 characters and cannot contain special characters (!<>=&"').

sample_type

No

Integer

Sample type. Options:

  • 0: image

  • 1: text

  • 2: speech

  • 4: table

  • 6: video

  • 9: custom format

Table 5 DataSource

Parameter

Mandatory

Type

Description

data_path

No

String

Data source path.

data_type

No

Integer

Data type. Options:

  • 0: OBS bucket (default value)

  • 1: GaussDB(DWS)

  • 2: DLI

  • 3: RDS

  • 4: MRS

  • 5: AI Gallery

  • 6: Inference service

schema_maps

No

Array of SchemaMap objects

Schema mapping information corresponding to the table data.

source_info

No

SourceInfo object

Information required for importing a table data source.

with_column_header

No

Boolean

Whether the first row in the file is a column name. This field is valid for the table dataset. Options:

  • true: The first row in the file is the column name.

  • false: The first row in the file is not the column name.

Table 6 SchemaMap

Parameter

Mandatory

Type

Description

dest_name

No

String

Name of the destination column.

src_name

No

String

Name of the source column.

Table 7 SourceInfo

Parameter

Mandatory

Type

Description

cluster_id

No

String

MRS cluster ID. You can log in to the MRS console to view the information.

cluster_mode

No

String

Running mode of an MRS cluster. Options:

  • 0: normal cluster

  • 1: security cluster

cluster_name

No

String

MRS cluster name You can log in to the MRS console to view the information.

database_name

No

String

Name of the database to which the table dataset is imported.

input

No

String

HDFS path of the table data set. For example, /datasets/demo.

ip

No

String

IP address of your GaussDB(DWS) cluster.

port

No

String

Port number of your GaussDB(DWS) cluster.

queue_name

No

String

DLI queue name of a table dataset.

subnet_id

No

String

Subnet ID of an MRS cluster.

table_name

No

String

Name of the table to which a table dataset is imported.

user_name

No

String

Username, which is mandatory for GaussDB(DWS) data.

user_password

No

String

User password, which is mandatory for GaussDB(DWS) data.

vpc_id

No

String

ID of the VPC where an MRS cluster resides.

Table 8 SampleLabel

Parameter

Mandatory

Type

Description

annotated_by

No

String

Video labeling method, which is used to distinguish whether a video is labeled manually or automatically. Options:

  • human: manual labeling

  • auto: automatic labeling

id

No

String

Label ID.

name

No

String

Label name.

property

No

SampleLabelProperty object

Attribute key-value pair of the sample label, such as the object shape and shape feature.

score

No

Float

Confidence. The value range is [0,1].

type

No

Integer

Label type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video labeling

Table 9 SampleLabelProperty

Parameter

Mandatory

Type

Description

@modelarts:content

No

String

Speech text content, which is a default attribute dedicated to the speech label (including the speech content and speech start and end points).

@modelarts:end_index

No

Integer

End position of the text, which is a default attribute dedicated to the named entity label. The end position does not include the character corresponding to the value of end_index. Example:

  • If the text is "Barack Hussein Obama II (born August 4, 1961) is an attorney and politician.", start_index and end_index of Barack Hussein Obama II are 0 and 23, respectively.

  • If the text is "Hope is the thing with feathers", start_index and end_index of Hope are 0 and 4, respectively.

@modelarts:end_time

No

String

Speech end time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)

@modelarts:feature

No

Object

Shape feature, which is a default attribute dedicated to the object detection label, with type of List. The upper left corner of the image is used as the coordinate origin [0, 0]. Each coordinate point is represented by [x, y], where x indicates the horizontal coordinate and y indicates the vertical coordinate (both x and y are >=0). The format of each shape is as follows:

  • bndbox consists of two points, for example, [[0,10],[50,95]]. The upper left vertex of the rectangle is the first point, and the lower right vertex is the second point. That is, the x-coordinate of the first point must be less than the x-coordinate of the second point, and the y-coordinate of the first point must be less than the y-coordinate of the second point.

  • polygon: consists of multiple points that are connected in sequence to form a polygon, for example, [[0,100],[50,95],[10,60],[500,400]].

  • circle: consists of the center and radius, for example, [[100,100],[50]].

  • line: consists of two points, for example, [[0,100],[50,95]]. The first point is the start point, and the second point is the end point.

  • dashed: consists of two points, for example, [[0,100],[50,95]]. The first point is the start point, and the second point is the end point.

  • point: consists of one point, for example, [[0,100]].

  • polyline: consists of multiple points, for example, [[0,100],[50,95],[10,60],[500,400]].

@modelarts:from

No

String

ID of the head entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.

@modelarts:hard

No

String

Sample labeled as a hard sample or not, which is a default attribute. Options:

  • 0/false: not a hard example

  • 1/true: hard example

@modelarts:hard_coefficient

No

String

Coefficient of difficulty of each label level, which is a default attribute. The value range is [0,1].

@modelarts:hard_reasons

No

String

Reasons that the sample is a hard sample, which is a default attribute. Use a hyphen (-) to separate every two hard sample reason IDs, for example, 3-20-21-19. Options:

  • 0: No target objects are identified.

  • 1: The confidence is low.

  • 2: The clustering result based on the training dataset is inconsistent with the prediction result.

  • 3: The prediction result is greatly different from the data of the same type in the training dataset.

  • 4: The prediction results of multiple consecutive similar images are inconsistent.

  • 5: There is a large offset between the image resolution and the feature distribution of the training dataset.

  • 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset.

  • 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset.

  • 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset.

  • 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset.

  • 10: There is a large offset between the definition of the image and the feature distribution of the training dataset.

  • 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset.

  • 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset.

  • 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset.

  • 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset.

  • 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset.

  • 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset.

  • 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset.

  • 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset.

  • 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image.

  • 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image.

  • 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image.

  • 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image.

  • 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image.

  • 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image.

  • 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image.

  • 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image.

  • 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image.

  • 28: The data enhancement result based on add is inconsistent with the prediction result of the original image.

  • 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image.

  • 30: The data is predicted to be abnormal.

@modelarts:shape

No

String

Object shape, which is a default attribute dedicated to the object detection label and is left empty by default. Options:

  • bndbox: rectangle

  • polygon: polygon

  • circle: circle

  • line: straight line

  • dashed: dotted line

  • point: point

  • polyline: polyline

@modelarts:source

No

String

Speech source, which is a default attribute dedicated to the speech start/end point label and can be set to a speaker or narrator.

@modelarts:start_index

No

Integer

Start position of the text, which is a default attribute dedicated to the named entity label. The start value begins from 0, including the character corresponding to the value of start_index.

@modelarts:start_time

No

String

Speech start time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)

@modelarts:to

No

String

ID of the tail entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.

Table 10 SampleMetadata

Parameter

Mandatory

Type

Description

@modelarts:import_origin

No

Integer

Sample source, which is a built-in attribute.

@modelarts:hard

No

Double

Whether the sample is labeled as a hard sample, which is a default attribute. Options:

  • 0: non-hard sample

  • 1: hard sample

@modelarts:hard_coefficient

No

Double

Coefficient of difficulty of each sample level, which is a default attribute. The value range is [0,1].

@modelarts:hard_reasons

No

Array of integers

ID of a hard sample reason, which is a default attribute. Options:

  • 0: No object is identified.

  • 1: The confidence is low.

  • 2: The clustering result based on the training dataset is inconsistent with the prediction result.

  • 3: The prediction result is greatly different from the data of the same type in the training dataset.

  • 4: The prediction results of multiple consecutive similar images are inconsistent.

  • 5: There is a large offset between the image resolution and the feature distribution of the training dataset.

  • 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset.

  • 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset.

  • 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset.

  • 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset.

  • 10: There is a large offset between the definition of the image and the feature distribution of the training dataset.

  • 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset.

  • 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset.

  • 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset.

  • 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset.

  • 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset.

  • 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset.

  • 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset.

  • 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset.

  • 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image.

  • 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image.

  • 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image.

  • 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image.

  • 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image.

  • 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image.

  • 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image.

  • 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image.

  • 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image.

  • 28: The data enhancement result based on add is inconsistent with the prediction result of the original image.

  • 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image.

  • 30: The data is predicted to be abnormal.

@modelarts:size

No

Array of objects

Image size (width, height, and depth of the image), which is a default attribute, with type of List<Integer>. In the list, the first number indicates the width (pixels), the second number indicates the height (pixels), and the third number indicates the depth (the depth can be left blank and the default value is 3). For example, [100,200,3] and [100,200] are both valid. Note: This parameter is mandatory only when the sample label list contains the object detection label.

Response Parameters

Status code: 200

Table 11 Response body parameters

Parameter

Type

Description

error_code

String

Error code.

error_msg

String

Error message.

results

Array of UploadSampleResp objects

Response list for adding samples in batches.

success

Boolean

Whether the operation is successful. Options:

  • true: successful

  • false: failed

Table 12 UploadSampleResp

Parameter

Type

Description

error_code

String

Error code.

error_msg

String

Error message.

info

String

Description.

name

String

Name of a sample file.

success

Boolean

Whether the operation is successful. Options:

  • true: successful

  • false: failed

Example Requests

Adding Samples in Batches

{
  "samples" : [ {
    "name" : "2.jpg",
    "data" : "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAA1AJUDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL"
  } ]
}

Example Responses

Status code: 200

OK

{
  "success" : true,
  "results" : [ {
    "success" : true,
    "name" : "/test-obs/classify/input/animals/2.jpg",
    "info" : "960585877c92d63911ba555ab3129d36"
  } ]
}

Status Codes

Status Code

Description

200

OK

401

Unauthorized

403

Forbidden

404

Not Found

Error Codes

See Error Codes.