Updated on 2023-12-14 GMT+08:00

Starting Intelligent Tasks

Function

This interface is used to start an intelligent task. Two types of intelligent tasks are supported: intelligent labeling and automatic grouping. You can specify the task_type parameter in the request body to start a type of tasks. For datasets whose data path or working path is in the KMS encryption bucket, active learning and automatic grouping tasks cannot be started, but pre-labeling tasks are supported.

  • Intelligent Labeling allows you to select an existing model in the system for intelligent labeling based on the labeling and image learning training in the current labeling phase to quickly label the remaining images. Auto labeling includes active learning and pre-labeling. * Active learning: The system uses semi-supervised learning and hard example filtering to perform auto labeling, reducing manual labeling workload and helping you find hard examples. * Pre-labeling: You select a model on the Model Management page for auto labeling. - Auto grouping: Unlabeled images are clustered using the clustering algorithm and then processed based on the clustering result. Images can be labeled by group or cleaned.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

POST /v2/{project_id}/datasets/{dataset_id}/tasks

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

dataset_id

Yes

String

Dataset ID.

project_id

Yes

String

Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

Request Parameters

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

collect_key_sample

No

Boolean

Whether to collect key samples. Options:

  • true: Collect key samples.

  • false: Do not collect key samples. (Default value)

config

No

SmartTaskConfig object

Task configuration.

model_id

No

String

Model ID.

task_type

No

String

Task type. The options are as follows: - auto-label: active learning - pre-label: pre-labeling - auto-grouping: auto grouping

Table 3 SmartTaskConfig

Parameter

Mandatory

Type

Description

algorithm_type

No

String

Algorithm type for auto labeling. Options:

  • fast: Only labeled samples are used for training. This type of algorithm achieves faster labeling.

  • accurate: In addition to labeled samples, unlabeled samples are used for semi-supervised training. This type of algorithm achieves more accurate labeling.

ambiguity

No

Boolean

Whether to perform clustering based on the image blurring degree.

annotation_output

No

String

Output path of the active learning labeling result.

collect_rule

No

String

Sample collection rule. The default value is all, indicating full collection. Currently, only value all is available.

collect_sample

No

Boolean

Whether to enable sample collection. Options:

  • true: Enable sample collection. (Default value)

  • false: Do not enable sample collection.

confidence_scope

No

String

Confidence range of key samples. The minimum and maximum values are separated by hyphens (-). Example: 0.10-0.90.

description

No

String

Task description.

engine_name

No

String

Engine name.

export_format

No

Integer

Format of the exported directory. Options:

  • 1: tree structure. Example: rabbits/1.jpg,bees/2.jpg.

  • 2: tile structure. Example: 1.jpg, 1.txt; 2.jpg,2.txt.

export_params

No

ExportParams object

Parameters of a dataset export task.

flavor

No

Flavor object

Training resource flavor.

image_brightness

No

Boolean

Whether to perform clustering based on the image brightness.

image_colorfulness

No

Boolean

Whether to perform clustering based on the image color.

inf_cluster_id

No

String

ID of a dedicated cluster. This parameter is left blank by default, indicating that a dedicated cluster is not used. When using the dedicated cluster to deploy services, ensure that the cluster status is normal. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect.

inf_config_list

No

Array of InfConfig objects

Configuration list required for running an inference task, which is optional and left blank by default.

inf_output

No

String

Output path of inference in active learning.

infer_result_output_dir

No

String

OBS directory for storing sample prediction results. This parameter is optional. The {service_id}-infer-result subdirectory in the output_dir directory is used by default.

key_sample_output

No

String

Output path of hard examples in active learning.

log_url

No

String

OBS URL of the logs of a training job. By default, this parameter is left blank.

manifest_path

No

String

Path of the manifest file, which is used as the input for training and inference.

model_id

No

String

Model ID.

model_name

No

String

Model name.

model_parameter

No

String

Model parameter.

model_version

No

String

Model version.

n_clusters

No

Integer

Number of clusters.

name

No

String

Task name.

output_dir

No

String

Sample output path. The format is as follows: Dataset output path/Dataset name-Dataset ID/annotation/auto-deploy/. Example: /test/work_1608083108676/dataset123-g6IO9qSu6hoxwCAirfm/annotation/auto-deploy/.

parameters

No

Array of TrainingParameter objects

Runtime parameters of a training job

pool_id

No

String

ID of a resource pool.

property

No

String

Attribute name.

req_uri

No

String

Inference path of a batch job.

result_type

No

Integer

Processing mode of auto grouping results. Options:

  • 0: Save to OBS.

  • 1: Save to samples.

samples

No

Array of SampleLabels objects

List of labeling information for samples to be auto labeled.

stop_time

No

Integer

Timeout interval, in minutes. The default value is 15 minutes. This parameter is used only in the scenario of auto labeling for videos.

time

No

String

Timestamp in active learning.

train_data_path

No

String

Path for storing existing training datasets.

train_url

No

String

URL of the OBS path where the file of a training job is outputted. By default, this parameter is left blank.

version_format

No

String

Format of a dataset version. Options:

  • Default: default format

  • CarbonData: CarbonData (supported only by table datasets)

  • CSV: CSV

worker_server_num

No

Integer

Number of workers in a training job.

Table 4 ExportParams

Parameter

Mandatory

Type

Description

clear_hard_property

No

Boolean

Whether to clear hard example attributes. Options:

  • true: Clear hard example attributes. (Default value)

  • false: Do not clear hard example attributes.

export_dataset_version_format

No

String

Format of the dataset version to which data is exported.

export_dataset_version_name

No

String

Name of the dataset version to which data is exported.

export_dest

No

String

Dataset export type. The options are as follows:

  • DIR: Data is exported to OBS (default value).

  • NEW_DATASET: Export data to a new dataset.

export_new_dataset_name

No

String

Name of the new dataset to which data is exported.

export_new_dataset_work_path

No

String

Working directory of the new dataset to which data is exported.

ratio_sample_usage

No

Boolean

Whether to randomly allocate the training set and validation set based on the specified ratio. Options:

  • true: Allocate the training set and validation set.

  • false: Do not allocate the training set and validation set. (Default value)

sample_state

No

String

Sample status. The options are as follows:

  • __ALL__: labeled

  • __NONE__: not marked

  • __UNCHECK__: to be accepted

  • __ACCEPTED__: The acceptance is passed.

  • __REJECTED__: rejected

  • __UNREVIEWED__: to be reviewed

  • __REVIEWED__: approved

  • __WORKFORCE_SAMPLED__: sampled

  • __WORKFORCE_SAMPLED_UNCHECK__: Sampling is to be accepted.

  • __WORKFORCE_SAMPLED_CHECKED__: Sampling has been accepted.

  • __WORKFORCE_SAMPLED_ACCEPTED__: The sampling is passed.

  • __WORKFORCE_SAMPLED_REJECTED__: The sampling has been rejected.

  • __AUTO_ANNOTATION__: to be confirmed

samples

No

Array of strings

ID list of exported samples.

search_conditions

No

Array of SearchCondition objects

Exported search conditions. The relationship between multiple search conditions is OR.

train_sample_ratio

No

String

Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets.

Table 5 SearchCondition

Parameter

Mandatory

Type

Description

coefficient

No

String

Filter by coefficient of difficulty.

frame_in_video

No

Integer

A frame in the video.

hard

No

String

Whether a sample is a hard sample. Options:

  • 0: non-hard sample

  • 1: hard sample

import_origin

No

String

Filter by data source.

kvp

No

String

CT dosage, filtered by dosage.

label_list

No

SearchLabels object

Label search criteria.

labeler

No

String

Labeler.

metadata

No

SearchProp object

Search by sample attribute.

parent_sample_id

No

String

Parent sample ID.

sample_dir

No

String

Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported.

sample_name

No

String

Search by sample name, including the file name extension.

sample_time

No

String

When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. Options:

  • month: Search for samples added from 30 days ago to the current day.

  • day: Search for samples added from yesterday (one day ago) to the current day.

  • yyyyMMdd-yyyyMMdd: Search for samples added in a specified period (at most 30 days), in the format of Start date-End date. For example, 20190901-2019091501 indicates that samples generated from September 1 to September 15, 2019 are searched.

score

No

String

Search by confidence.

slice_thickness

No

String

DICOM layer thickness. Samples are filtered by layer thickness.

study_date

No

String

DICOM scanning time.

time_in_video

No

String

A time point in the video.

Table 6 SearchLabels

Parameter

Mandatory

Type

Description

labels

No

Array of SearchLabel objects

List of label search criteria.

op

No

String

If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. Options:

  • OR: OR operation

  • AND: AND operation

Table 7 SearchLabel

Parameter

Mandatory

Type

Description

name

No

String

Label name.

op

No

String

Operation type between multiple attributes. Options:

  • OR: OR operation

  • AND: AND operation

property

No

Map<String,Array<String>>

Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list.

type

No

Integer

Label type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video labeling

Table 8 SearchProp

Parameter

Mandatory

Type

Description

op

No

String

Relationship between attribute values. Options:

  • AND: AND relationship

  • OR: OR relationship

props

No

Map<String,Array<String>>

Search criteria of an attribute. Multiple search criteria can be set.

Table 9 Flavor

Parameter

Mandatory

Type

Description

code

No

String

Attribute code of a resource specification, which is used for task creating.

Table 10 InfConfig

Parameter

Mandatory

Type

Description

envs

No

Map<String,String>

(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. To ensure data security, do not enter sensitive information in environment variables.

instance_count

No

Integer

Instance number of model deployment, that is, the number of compute nodes.

model_id

No

String

Model ID.

specification

No

String

Resource specifications of real-time services. For details, see Deploying Services.

weight

No

Integer

Traffic weight allocated to a model. This parameter is mandatory only when infer_type is set to real-time. The sum of the weights must be 100.

Table 11 TrainingParameter

Parameter

Mandatory

Type

Description

label

No

String

Parameter name.

value

No

String

Parameter value.

Table 12 SampleLabels

Parameter

Mandatory

Type

Description

labels

No

Array of SampleLabel objects

Sample label list. If this parameter is left blank, all sample labels are deleted.

metadata

No

SampleMetadata object

Key-value pair of the sample metadata attribute.

sample_id

No

String

Sample ID.

sample_type

No

Integer

Sample type. Options:

  • 0: image

  • 1: text

  • 2: speech

  • 4: table

  • 6: video

  • 9: custom format

sample_usage

No

String

Sample usage. Options:

  • TRAIN: training

  • EVAL: evaluation

  • TEST: test

  • INFERENCE: inference

source

No

String

Source address of sample data, which is obtained by invoking the sample list interface.

worker_id

No

String

ID of a labeling team member.

Table 13 SampleLabel

Parameter

Mandatory

Type

Description

annotated_by

No

String

Video labeling method, which is used to distinguish whether a video is labeled manually or automatically. Options:

  • human: manual labeling

  • auto: automatic labeling

id

No

String

Label ID.

name

No

String

Label name.

property

No

SampleLabelProperty object

Attribute key-value pair of the sample label, such as the object shape and shape feature.

score

No

Float

Confidence. The value range is [0,1].

type

No

Integer

Label type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video labeling

Table 14 SampleLabelProperty

Parameter

Mandatory

Type

Description

@modelarts:content

No

String

Speech text content, which is a default attribute dedicated to the speech label (including the speech content and speech start and end points).

@modelarts:end_index

No

Integer

End position of the text, which is a default attribute dedicated to the named entity label. The end position does not include the character corresponding to the value of end_index. Example:

  • If the text is "Barack Hussein Obama II (born August 4, 1961) is an attorney and politician.", start_index and end_index of Barack Hussein Obama II are 0 and 23, respectively.

  • If the text is "Hope is the thing with feathers", start_index and end_index of Hope are 0 and 4, respectively.

@modelarts:end_time

No

String

Speech end time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)

@modelarts:feature

No

Object

Shape feature, which is a default attribute dedicated to the object detection label, with type of List. The upper left corner of the image is used as the coordinate origin [0, 0]. Each coordinate point is represented by [x, y], where x indicates the horizontal coordinate and y indicates the vertical coordinate (both x and y are >=0). The format of each shape is as follows:

  • bndbox consists of two points, for example, [[0,10],[50,95]]. The upper left vertex of the rectangle is the first point, and the lower right vertex is the second point. That is, the x-coordinate of the first point must be less than the x-coordinate of the second point, and the y-coordinate of the first point must be less than the y-coordinate of the second point.

  • polygon: consists of multiple points that are connected in sequence to form a polygon, for example, [[0,100],[50,95],[10,60],[500,400]].

  • circle: consists of the center and radius, for example, [[100,100],[50]].

  • line: consists of two points, for example, [[0,100],[50,95]]. The first point is the start point, and the second point is the end point.

  • dashed: consists of two points, for example, [[0,100],[50,95]]. The first point is the start point, and the second point is the end point.

  • point: consists of one point, for example, [[0,100]].

  • polyline: consists of multiple points, for example, [[0,100],[50,95],[10,60],[500,400]].

@modelarts:from

No

String

ID of the head entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.

@modelarts:hard

No

String

Sample labeled as a hard sample or not, which is a default attribute. Options:

  • 0/false: not a hard example

  • 1/true: hard example

@modelarts:hard_coefficient

No

String

Coefficient of difficulty of each label level, which is a default attribute. The value range is [0,1].

@modelarts:hard_reasons

No

String

Reasons that the sample is a hard sample, which is a default attribute. Use a hyphen (-) to separate every two hard sample reason IDs, for example, 3-20-21-19. Options:

  • 0: No target objects are identified.

  • 1: The confidence is low.

  • 2: The clustering result based on the training dataset is inconsistent with the prediction result.

  • 3: The prediction result is greatly different from the data of the same type in the training dataset.

  • 4: The prediction results of multiple consecutive similar images are inconsistent.

  • 5: There is a large offset between the image resolution and the feature distribution of the training dataset.

  • 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset.

  • 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset.

  • 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset.

  • 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset.

  • 10: There is a large offset between the definition of the image and the feature distribution of the training dataset.

  • 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset.

  • 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset.

  • 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset.

  • 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset.

  • 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset.

  • 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset.

  • 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset.

  • 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset.

  • 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image.

  • 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image.

  • 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image.

  • 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image.

  • 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image.

  • 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image.

  • 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image.

  • 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image.

  • 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image.

  • 28: The data enhancement result based on add is inconsistent with the prediction result of the original image.

  • 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image.

  • 30: The data is predicted to be abnormal.

@modelarts:shape

No

String

Object shape, which is a default attribute dedicated to the object detection label and is left empty by default. Options:

  • bndbox: rectangle

  • polygon: polygon

  • circle: circle

  • line: straight line

  • dashed: dotted line

  • point: point

  • polyline: polyline

@modelarts:source

No

String

Speech source, which is a default attribute dedicated to the speech start/end point label and can be set to a speaker or narrator.

@modelarts:start_index

No

Integer

Start position of the text, which is a default attribute dedicated to the named entity label. The start value begins from 0, including the character corresponding to the value of start_index.

@modelarts:start_time

No

String

Speech start time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)

@modelarts:to

No

String

ID of the tail entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.

Table 15 SampleMetadata

Parameter

Mandatory

Type

Description

@modelarts:import_origin

No

Integer

Sample source, which is a built-in attribute.

@modelarts:hard

No

Double

Whether the sample is labeled as a hard sample, which is a default attribute. Options:

  • 0: non-hard sample

  • 1: hard sample

@modelarts:hard_coefficient

No

Double

Coefficient of difficulty of each sample level, which is a default attribute. The value range is [0,1].

@modelarts:hard_reasons

No

Array of integers

ID of a hard sample reason, which is a default attribute. Options:

  • 0: No object is identified.

  • 1: The confidence is low.

  • 2: The clustering result based on the training dataset is inconsistent with the prediction result.

  • 3: The prediction result is greatly different from the data of the same type in the training dataset.

  • 4: The prediction results of multiple consecutive similar images are inconsistent.

  • 5: There is a large offset between the image resolution and the feature distribution of the training dataset.

  • 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset.

  • 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset.

  • 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset.

  • 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset.

  • 10: There is a large offset between the definition of the image and the feature distribution of the training dataset.

  • 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset.

  • 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset.

  • 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset.

  • 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset.

  • 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset.

  • 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset.

  • 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset.

  • 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset.

  • 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image.

  • 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image.

  • 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image.

  • 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image.

  • 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image.

  • 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image.

  • 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image.

  • 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image.

  • 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image.

  • 28: The data enhancement result based on add is inconsistent with the prediction result of the original image.

  • 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image.

  • 30: The data is predicted to be abnormal.

@modelarts:size

No

Array of objects

Image size (width, height, and depth of the image), which is a default attribute, with type of List<Integer>. In the list, the first number indicates the width (pixels), the second number indicates the height (pixels), and the third number indicates the depth (the depth can be left blank and the default value is 3). For example, [100,200,3] and [100,200] are both valid. Note: This parameter is mandatory only when the sample label list contains the object detection label.

Response Parameters

Status code: 200

Table 16 Response body parameters

Parameter

Type

Description

task_id

String

Task ID.

Example Requests

  • The following is an example of how to start an auto labeling (active learning) task. The task type has been set to auto-label.

    {
      "task_type" : "auto-label",
      "collect_key_sample" : true,
      "config" : {
        "algorithm_type" : "fast"
      }
    }
  • The following is an example of how to start an auto labeling (pre-labeling) task. The task type has been set to pre-label.

    {
      "task_type" : "pre-label",
      "model_id" : "c4989033-7584-44ee-a180-1c476b810e46",
      "collect_key_sample" : true,
      "config" : {
        "inf_config_list" : [ {
          "specification" : "modelarts.vm.cpu.2u",
          "instance_count" : 1
        } ]
      }
    }
  • The following is an example of how to start an auto grouping task. The task type has been set to auto-grouping.

    {
      "task_type" : "auto-grouping",
      "config" : {
        "n_clusters" : "2",
        "ambiguity" : false,
        "image_brightness" : false,
        "image_colorfulness" : false,
        "property" : "size",
        "result_type" : 1
      }
    }

Example Responses

Status code: 200

OK

{
  "task_id" : "r0jT2zwxBDKf8KEnSuZ"
}

Status Codes

Status Code

Description

200

OK

401

Unauthorized

403

Forbidden

404

Not Found

Error Codes

See Error Codes.