Starting Intelligent Tasks

Updated on 2024-05-30 GMT+08:00

View PDF

Function

This interface is used to start an intelligent task. Two types of intelligent tasks are supported: intelligent labeling and automatic grouping. You can specify the task_type parameter in the request body to start a type of tasks. For datasets whose data path or working path is in the KMS encryption bucket, active learning and automatic grouping tasks cannot be started, but pre-labeling tasks are supported.

Intelligent Labeling allows you to select an existing model in the system for intelligent labeling based on the labeling and image learning training in the current labeling phase to quickly label the remaining images. Auto labeling includes active learning and pre-labeling. * Active learning: The system uses semi-supervised learning and hard example filtering to perform auto labeling, reducing manual labeling workload and helping you find hard examples. * Pre-labeling: You select a model on the Model Management page for auto labeling. - Auto grouping: Unlabeled images are clustered using the clustering algorithm and then processed based on the clustering result. Images can be labeled by group or cleaned.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

POST /v2/{project_id}/datasets/{dataset_id}/tasks

**Table 1** Path Parameters
Parameter	Mandatory	Type	Description
dataset_id	Yes	String	Dataset ID.
project_id	Yes	String	Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

Request Parameters

**Table 2** Request body parameters
Parameter	Mandatory	Type	Description
collect_key_sample	No	Boolean	Whether to collect key samples. Options: true: Collect key samples. false: Do not collect key samples. (Default value)
config	No	SmartTaskConfig object	Task configuration.
model_id	No	String	Model ID.
task_type	No	String	Task type. The options are as follows: - auto-label: active learning - pre-label: pre-labeling - auto-grouping: auto grouping

**Table 3** SmartTaskConfig
Parameter	Mandatory	Type	Description
algorithm_type	No	String	Algorithm type for auto labeling. Options: fast: Only labeled samples are used for training. This type of algorithm achieves faster labeling. accurate: In addition to labeled samples, unlabeled samples are used for semi-supervised training. This type of algorithm achieves more accurate labeling.
ambiguity	No	Boolean	Whether to perform clustering based on the image blurring degree.
annotation_output	No	String	Output path of the active learning labeling result.
collect_rule	No	String	Sample collection rule. The default value is all, indicating full collection. Currently, only value all is available.
collect_sample	No	Boolean	Whether to enable sample collection. Options: true: Enable sample collection. (Default value) false: Do not enable sample collection.
confidence_scope	No	String	Confidence range of key samples. The minimum and maximum values are separated by hyphens (-). Example: 0.10-0.90.
description	No	String	Task description.
engine_name	No	String	Engine name.
export_format	No	Integer	Format of the exported directory. Options: 1: tree structure. Example: rabbits/1.jpg,bees/2.jpg. 2: tile structure. Example: 1.jpg, 1.txt; 2.jpg,2.txt.
export_params	No	ExportParams object	Parameters of a dataset export task.
flavor	No	Flavor object	Training resource flavor.
image_brightness	No	Boolean	Whether to perform clustering based on the image brightness.
image_colorfulness	No	Boolean	Whether to perform clustering based on the image color.
inf_cluster_id	No	String	ID of a dedicated cluster. This parameter is left blank by default, indicating that a dedicated cluster is not used. When using the dedicated cluster to deploy services, ensure that the cluster status is normal. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect.
inf_config_list	No	Array of InfConfig objects	Configuration list required for running an inference task, which is optional and left blank by default.
inf_output	No	String	Output path of inference in active learning.
infer_result_output_dir	No	String	OBS directory for storing sample prediction results. This parameter is optional. The {service_id}-infer-result subdirectory in the output_dir directory is used by default.
key_sample_output	No	String	Output path of hard examples in active learning.
log_url	No	String	OBS URL of the logs of a training job. By default, this parameter is left blank.
manifest_path	No	String	Path of the manifest file, which is used as the input for training and inference.
model_id	No	String	Model ID.
model_name	No	String	Model name.
model_parameter	No	String	Model parameter.
model_version	No	String	Model version.
n_clusters	No	Integer	Number of clusters.
name	No	String	Task name.
output_dir	No	String	Sample output path. The format is as follows: Dataset output path/Dataset name-Dataset ID/annotation/auto-deploy/. Example: /test/work_1608083108676/dataset123-g6IO9qSu6hoxwCAirfm/annotation/auto-deploy/.
parameters	No	Array of TrainingParameter objects	Runtime parameters of a training job
pool_id	No	String	ID of a resource pool.
property	No	String	Attribute name.
req_uri	No	String	Inference path of a batch job.
result_type	No	Integer	Processing mode of auto grouping results. Options: 0: Save to OBS. 1: Save to samples.
samples	No	Array of SampleLabels objects	List of labeling information for samples to be auto labeled.
stop_time	No	Integer	Timeout interval, in minutes. The default value is 15 minutes. This parameter is used only in the scenario of auto labeling for videos.
time	No	String	Timestamp in active learning.
train_data_path	No	String	Path for storing existing training datasets.
train_url	No	String	URL of the OBS path where the file of a training job is outputted. By default, this parameter is left blank.
version_format	No	String	Format of a dataset version. Options: Default: default format CarbonData: CarbonData (supported only by table datasets) CSV: CSV
worker_server_num	No	Integer	Number of workers in a training job.

**Table 4** ExportParams
Parameter	Mandatory	Type	Description
clear_hard_property	No	Boolean	Whether to clear hard example attributes. Options: true: Clear hard example attributes. (Default value) false: Do not clear hard example attributes.
export_dataset_version_format	No	String	Format of the dataset version to which data is exported.
export_dataset_version_name	No	String	Name of the dataset version to which data is exported.
export_dest	No	String	Dataset export type. The options are as follows: DIR: Data is exported to OBS (default value). NEW_DATASET: Export data to a new dataset.
export_new_dataset_name	No	String	Name of the new dataset to which data is exported.
export_new_dataset_work_path	No	String	Working directory of the new dataset to which data is exported.
ratio_sample_usage	No	Boolean	Whether to randomly allocate the training set and validation set based on the specified ratio. Options: true: Allocate the training set and validation set. false: Do not allocate the training set and validation set. (Default value)
sample_state	No	String	Sample status. The options are as follows: __ALL__: labeled __NONE__: not marked __UNCHECK__: to be accepted __ACCEPTED__: The acceptance is passed. __REJECTED__: rejected __UNREVIEWED__: to be reviewed __REVIEWED__: approved __WORKFORCE_SAMPLED__: sampled __WORKFORCE_SAMPLED_UNCHECK__: Sampling is to be accepted. __WORKFORCE_SAMPLED_CHECKED__: Sampling has been accepted. __WORKFORCE_SAMPLED_ACCEPTED__: The sampling is passed. __WORKFORCE_SAMPLED_REJECTED__: The sampling has been rejected. __AUTO_ANNOTATION__: to be confirmed
samples	No	Array of strings	ID list of exported samples.
search_conditions	No	Array of SearchCondition objects	Exported search conditions. The relationship between multiple search conditions is OR.
train_sample_ratio	No	String	Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets.

**Table 5** SearchCondition
Parameter	Mandatory	Type	Description
coefficient	No	String	Filter by coefficient of difficulty.
frame_in_video	No	Integer	A frame in the video.
hard	No	String	Whether a sample is a hard sample. Options: 0: non-hard sample 1: hard sample
import_origin	No	String	Filter by data source.
kvp	No	String	CT dosage, filtered by dosage.
label_list	No	SearchLabels object	Label search criteria.
labeler	No	String	Labeler.
metadata	No	SearchProp object	Search by sample attribute.
parent_sample_id	No	String	Parent sample ID.
sample_dir	No	String	Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported.
sample_name	No	String	Search by sample name, including the file name extension.
sample_time	No	String	When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. Options: month: Search for samples added from 30 days ago to the current day. day: Search for samples added from yesterday (one day ago) to the current day. yyyyMMdd-yyyyMMdd: Search for samples added in a specified period (at most 30 days), in the format of Start date-End date. For example, 20190901-2019091501 indicates that samples generated from September 1 to September 15, 2019 are searched.
score	No	String	Search by confidence.
slice_thickness	No	String	DICOM layer thickness. Samples are filtered by layer thickness.
study_date	No	String	DICOM scanning time.
time_in_video	No	String	A time point in the video.

**Table 6** SearchLabels
Parameter	Mandatory	Type	Description
labels	No	Array of SearchLabel objects	List of label search criteria.
op	No	String	If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. Options: OR: OR operation AND: AND operation

**Table 7** SearchLabel
Parameter	Mandatory	Type	Description
name	No	String	Label name.
op	No	String	Operation type between multiple attributes. Options: OR: OR operation AND: AND operation
property	No	Map<String,Array<String>>	Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list.
type	No	Integer	Label type. Options: 0: image classification 1: object detection 3: image segmentation 100: text classification 101: named entity recognition 102: text triplet relationship 103: text triplet entity 200: sound classification 201: speech content 202: speech paragraph labeling 600: video labeling

**Table 8** SearchProp
Parameter	Mandatory	Type	Description
op	No	String	Relationship between attribute values. Options: AND: AND relationship OR: OR relationship
props	No	Map<String,Array<String>>	Search criteria of an attribute. Multiple search criteria can be set.

**Table 9** Flavor
Parameter	Mandatory	Type	Description
code	No	String	Attribute code of a resource specification, which is used for task creating.

**Table 10** InfConfig
Parameter	Mandatory	Type	Description
envs	No	Map<String,String>	(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. To ensure data security, do not enter sensitive information in environment variables.
instance_count	No	Integer	Instance number of model deployment, that is, the number of compute nodes.
model_id	No	String	Model ID.
specification	No	String	Resource specifications of real-time services. For details, see Deploying Services.
weight	No	Integer	Traffic weight allocated to a model. This parameter is mandatory only when infer_type is set to real-time. The sum of the weights must be 100.

**Table 11** TrainingParameter
Parameter	Mandatory	Type	Description
label	No	String	Parameter name.
value	No	String	Parameter value.

**Table 12** SampleLabels
Parameter	Mandatory	Type	Description
labels	No	Array of SampleLabel objects	Sample label list. If this parameter is left blank, all sample labels are deleted.
metadata	No	SampleMetadata object	Key-value pair of the sample metadata attribute.
sample_id	No	String	Sample ID.
sample_type	No	Integer	Sample type. Options: 0: image 1: text 2: speech 4: table 6: video 9: custom format
sample_usage	No	String	Sample usage. Options: TRAIN: training EVAL: evaluation TEST: test INFERENCE: inference
source	No	String	Source address of sample data, which is obtained by invoking the sample list interface.
worker_id	No	String	ID of a labeling team member.

**Table 13** SampleLabel
Parameter	Mandatory	Type	Description
annotated_by	No	String	Video labeling method, which is used to distinguish whether a video is labeled manually or automatically. Options: human: manual labeling auto: automatic labeling
id	No	String	Label ID.
name	No	String	Label name.
property	No	SampleLabelProperty object	Attribute key-value pair of the sample label, such as the object shape and shape feature.
score	No	Float	Confidence. The value range is [0,1].
type	No	Integer	Label type. Options: 0: image classification 1: object detection 3: image segmentation 100: text classification 101: named entity recognition 102: text triplet relationship 103: text triplet entity 200: sound classification 201: speech content 202: speech paragraph labeling 600: video labeling

**Table 14** SampleLabelProperty
Parameter	Mandatory	Type	Description
@modelarts:content	No	String	Speech text content, which is a default attribute dedicated to the speech label (including the speech content and speech start and end points).
@modelarts:end_index	No	Integer	End position of the text, which is a default attribute dedicated to the named entity label. The end position does not include the character corresponding to the value of end_index. Example: If the text is "Barack Hussein Obama II (born August 4, 1961) is an attorney and politician.", start_index and end_index of Barack Hussein Obama II are 0 and 23, respectively. If the text is "Hope is the thing with feathers", start_index and end_index of Hope are 0 and 4, respectively.
@modelarts:end_time	No	String	Speech end time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)
@modelarts:feature	No	Object	Shape feature, which is a default attribute dedicated to the object detection label, with type of List. The upper left corner of the image is used as the coordinate origin [0, 0]. Each coordinate point is represented by [x, y], where x indicates the horizontal coordinate and y indicates the vertical coordinate (both x and y are >=0). The format of each shape is as follows: bndbox consists of two points, for example, [[0,10],[50,95]]. The upper left vertex of the rectangle is the first point, and the lower right vertex is the second point. That is, the x-coordinate of the first point must be less than the x-coordinate of the second point, and the y-coordinate of the first point must be less than the y-coordinate of the second point. polygon: consists of multiple points that are connected in sequence to form a polygon, for example, [[0,100],[50,95],[10,60],[500,400]]. circle: consists of the center and radius, for example, [[100,100],[50]]. line: consists of two points, for example, [[0,100],[50,95]]. The first point is the start point, and the second point is the end point. dashed: consists of two points, for example, [[0,100],[50,95]]. The first point is the start point, and the second point is the end point. point: consists of one point, for example, [[0,100]]. polyline: consists of multiple points, for example, [[0,100],[50,95],[10,60],[500,400]].
@modelarts:from	No	String	ID of the head entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.
@modelarts:hard	No	String	Sample labeled as a hard sample or not, which is a default attribute. Options: 0/false: not a hard example 1/true: hard example
@modelarts:hard_coefficient	No	String	Coefficient of difficulty of each label level, which is a default attribute. The value range is [0,1].
@modelarts:hard_reasons	No	String	Reasons that the sample is a hard sample, which is a default attribute. Use a hyphen (-) to separate every two hard sample reason IDs, for example, 3-20-21-19. Options: 0: No target objects are identified. 1: The confidence is low. 2: The clustering result based on the training dataset is inconsistent with the prediction result. 3: The prediction result is greatly different from the data of the same type in the training dataset. 4: The prediction results of multiple consecutive similar images are inconsistent. 5: There is a large offset between the image resolution and the feature distribution of the training dataset. 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset. 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset. 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset. 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset. 10: There is a large offset between the definition of the image and the feature distribution of the training dataset. 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset. 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset. 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset. 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset. 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset. 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset. 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset. 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset. 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image. 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image. 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image. 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image. 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image. 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image. 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image. 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image. 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image. 28: The data enhancement result based on add is inconsistent with the prediction result of the original image. 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image. 30: The data is predicted to be abnormal.
@modelarts:shape	No	String	Object shape, which is a default attribute dedicated to the object detection label and is left empty by default. Options: bndbox: rectangle polygon: polygon circle: circle line: straight line dashed: dotted line point: point polyline: polyline
@modelarts:source	No	String	Speech source, which is a default attribute dedicated to the speech start/end point label and can be set to a speaker or narrator.
@modelarts:start_index	No	Integer	Start position of the text, which is a default attribute dedicated to the named entity label. The start value begins from 0, including the character corresponding to the value of start_index.
@modelarts:start_time	No	String	Speech start time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)
@modelarts:to	No	String	ID of the tail entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.

**Table 15** SampleMetadata
Parameter	Mandatory	Type	Description
@modelarts:import_origin	No	Integer	Sample source, which is a built-in attribute.
@modelarts:hard	No	Double	Whether the sample is labeled as a hard sample, which is a default attribute. Options: 0: non-hard sample 1: hard sample
@modelarts:hard_coefficient	No	Double	Coefficient of difficulty of each sample level, which is a default attribute. The value range is [0,1].
@modelarts:hard_reasons	No	Array of integers	ID of a hard sample reason, which is a default attribute. Options: 0: No object is identified. 1: The confidence is low. 2: The clustering result based on the training dataset is inconsistent with the prediction result. 3: The prediction result is greatly different from the data of the same type in the training dataset. 4: The prediction results of multiple consecutive similar images are inconsistent. 5: There is a large offset between the image resolution and the feature distribution of the training dataset. 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset. 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset. 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset. 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset. 10: There is a large offset between the definition of the image and the feature distribution of the training dataset. 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset. 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset. 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset. 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset. 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset. 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset. 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset. 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset. 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image. 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image. 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image. 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image. 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image. 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image. 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image. 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image. 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image. 28: The data enhancement result based on add is inconsistent with the prediction result of the original image. 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image. 30: The data is predicted to be abnormal.
@modelarts:size	No	Array of objects	Image size (width, height, and depth of the image), which is a default attribute, with type of List<Integer>. In the list, the first number indicates the width (pixels), the second number indicates the height (pixels), and the third number indicates the depth (the depth can be left blank and the default value is 3). For example, [100,200,3] and [100,200] are both valid. Note: This parameter is mandatory only when the sample label list contains the object detection label.

Response Parameters

Status code: 200

**Table 16** Response body parameters
Parameter	Type	Description
task_id	String	Task ID.

Example Requests

The following is an example of how to start an auto labeling (active learning) task. The task type has been set to auto-label.

{
  "task_type" : "auto-label",
  "collect_key_sample" : true,
  "config" : {
    "algorithm_type" : "fast"
  }
}

The following is an example of how to start an auto labeling (pre-labeling) task. The task type has been set to pre-label.

{
  "task_type" : "pre-label",
  "model_id" : "c4989033-7584-44ee-a180-1c476b810e46",
  "collect_key_sample" : true,
  "config" : {
    "inf_config_list" : [ {
      "specification" : "modelarts.vm.cpu.2u",
      "instance_count" : 1
    } ]
  }
}

The following is an example of how to start an auto grouping task. The task type has been set to auto-grouping.

{
  "task_type" : "auto-grouping",
  "config" : {
    "n_clusters" : "2",
    "ambiguity" : false,
    "image_brightness" : false,
    "image_colorfulness" : false,
    "property" : "size",
    "result_type" : 1
  }
}