Updated on 2024-05-30 GMT+08:00

Creating a Dataset Export Task

Function

This API is used to create a dataset export task to export a dataset to OBS or new datasets.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

POST /v2/{project_id}/datasets/{dataset_id}/export-tasks

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

dataset_id

Yes

String

Dataset ID.

project_id

Yes

String

Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

Request Parameters

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

annotation_format

No

String

Labeling format. Options:

  • VOC: VOC

  • COCO: COCO

export_format

No

Integer

Format of the exported directory. Options:

  • 1: tree structure. Example: rabbits/1.jpg,bees/2.jpg.

  • 2: tile structure. Example: 1.jpg, 1.txt; 2.jpg,2.txt.

export_params

No

ExportParams object

Parameters of a dataset export task.

export_type

No

Integer

Export type. Options:

  • 0: labeled

  • 1: unlabeled

  • 2: all

  • 3: conditional search

path

No

String

Output path for exporting data to OBS. This parameter is mandatory when data is exported to OBS or a new dataset.

sample_state

No

String

Sample status. The options are as follows:

  • __ALL__: labeled

  • __NONE__: not marked

  • __UNCHECK__: to be accepted

  • __ACCEPTED__: The acceptance is passed.

  • __REJECTED__: rejected

  • __UNREVIEWED__: to be reviewed

  • __REVIEWED__: approved

  • __WORKFORCE_SAMPLED__: sampled

  • __WORKFORCE_SAMPLED_UNCHECK__: Sampling is to be accepted.

  • __WORKFORCE_SAMPLED_CHECKED__: Sampling has been accepted.

  • __WORKFORCE_SAMPLED_ACCEPTED__: The sampling is passed.

  • __WORKFORCE_SAMPLED_REJECTED__: The sampling has been rejected.

  • __AUTO_ANNOTATION__: to be confirmed

source_type_header

No

String

Prefix of the OBS path in the exported labeling file. The default value is obs://. You can set it to s3://. The image path starting with obs cannot be parsed during training. Set the path prefix in the exported manifest file to s3://.

status

No

Integer

Task status.

task_id

No

String

Task ID.

version_format

No

String

Format of a dataset version. Options:

  • Default: default format

  • CarbonData: CarbonData (supported only by table datasets)

  • CSV: CSV

version_id

No

String

Dataset version ID, which must be specified when data of a dataset version is exported.

with_column_header

No

Boolean

Whether to write the column name in the first line of the CSV file during export. This field is valid for the table dataset. Options:

  • true: Write the column name in the first line of the CSV file. (Default value)

  • false: Do not write the column name in the first line of the CSV file.

Table 3 ExportParams

Parameter

Mandatory

Type

Description

clear_hard_property

No

Boolean

Whether to clear hard example attributes. Options:

  • true: Clear hard example attributes. (Default value)

  • false: Do not clear hard example attributes.

export_dataset_version_format

No

String

Format of the dataset version to which data is exported.

export_dataset_version_name

No

String

Name of the dataset version to which data is exported.

export_dest

No

String

Dataset export type. The options are as follows:

  • DIR: Data is exported to OBS (default value).

  • NEW_DATASET: Export data to a new dataset.

export_new_dataset_name

No

String

Name of the new dataset to which data is exported.

export_new_dataset_work_path

No

String

Working directory of the new dataset to which data is exported.

ratio_sample_usage

No

Boolean

Whether to randomly allocate the training set and validation set based on the specified ratio. Options:

  • true: Allocate the training set and validation set.

  • false: Do not allocate the training set and validation set. (Default value)

sample_state

No

String

Sample status. The options are as follows:

  • __ALL__: labeled

  • __NONE__: not marked

  • __UNCHECK__: to be accepted

  • __ACCEPTED__: The acceptance is passed.

  • __REJECTED__: rejected

  • __UNREVIEWED__: to be reviewed

  • __REVIEWED__: approved

  • __WORKFORCE_SAMPLED__: sampled

  • __WORKFORCE_SAMPLED_UNCHECK__: Sampling is to be accepted.

  • __WORKFORCE_SAMPLED_CHECKED__: Sampling has been accepted.

  • __WORKFORCE_SAMPLED_ACCEPTED__: The sampling is passed.

  • __WORKFORCE_SAMPLED_REJECTED__: The sampling has been rejected.

  • __AUTO_ANNOTATION__: to be confirmed

samples

No

Array of strings

ID list of exported samples.

search_conditions

No

Array of SearchCondition objects

Exported search conditions. The relationship between multiple search conditions is OR.

train_sample_ratio

No

String

Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets.

Table 4 SearchCondition

Parameter

Mandatory

Type

Description

coefficient

No

String

Filter by coefficient of difficulty.

frame_in_video

No

Integer

A frame in the video.

hard

No

String

Whether a sample is a hard sample. Options:

  • 0: non-hard sample

  • 1: hard sample

import_origin

No

String

Filter by data source.

kvp

No

String

CT dosage, filtered by dosage.

label_list

No

SearchLabels object

Label search criteria.

labeler

No

String

Labeler.

metadata

No

SearchProp object

Search by sample attribute.

parent_sample_id

No

String

Parent sample ID.

sample_dir

No

String

Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported.

sample_name

No

String

Search by sample name, including the file name extension.

sample_time

No

String

When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. Options:

  • month: Search for samples added from 30 days ago to the current day.

  • day: Search for samples added from yesterday (one day ago) to the current day.

  • yyyyMMdd-yyyyMMdd: Search for samples added in a specified period (at most 30 days), in the format of Start date-End date. For example, 20190901-2019091501 indicates that samples generated from September 1 to September 15, 2019 are searched.

score

No

String

Search by confidence.

slice_thickness

No

String

DICOM layer thickness. Samples are filtered by layer thickness.

study_date

No

String

DICOM scanning time.

time_in_video

No

String

A time point in the video.

Table 5 SearchLabels

Parameter

Mandatory

Type

Description

labels

No

Array of SearchLabel objects

List of label search criteria.

op

No

String

If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. Options:

  • OR: OR operation

  • AND: AND operation

Table 6 SearchLabel

Parameter

Mandatory

Type

Description

name

No

String

Label name.

op

No

String

Operation type between multiple attributes. Options:

  • OR: OR operation

  • AND: AND operation

property

No

Map<String,Array<String>>

Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list.

type

No

Integer

Label type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video labeling

Table 7 SearchProp

Parameter

Mandatory

Type

Description

op

No

String

Relationship between attribute values. Options:

  • AND: AND relationship

  • OR: OR relationship

props

No

Map<String,Array<String>>

Search criteria of an attribute. Multiple search criteria can be set.

Response Parameters

Status code: 200

Table 8 Response body parameters

Parameter

Type

Description

create_time

Long

Time when a task is created.

error_code

String

Error code.

error_msg

String

Error message.

export_format

Integer

Format of the exported directory. Options:

  • 1: tree structure. Example: rabbits/1.jpg,bees/2.jpg.

  • 2: tile structure. Example: 1.jpg, 1.txt; 2.jpg,2.txt.

export_params

ExportParams object

Parameters of a dataset export task.

export_type

Integer

Export type. Options:

  • 0: labeled

  • 1: unlabeled

  • 2: all

  • 3: conditional search

finished_sample_count

Integer

Number of completed samples.

path

String

Export output path.

progress

Float

Percentage of current task progress.

status

String

Task status. Options:

  • INIT: initialized

  • RUNNING: running

  • FAILED: failed

  • SUCCESSED: completed

task_id

String

Task ID.

total_sample_count

Integer

Total number of samples.

update_time

Long

Time when a task is updated.

version_format

String

Format of a dataset version. Options:

  • Default: default format

  • CarbonData: CarbonData (supported only by table datasets)

  • CSV: CSV

version_id

String

Dataset version ID.

Table 9 ExportParams

Parameter

Type

Description

clear_hard_property

Boolean

Whether to clear hard example attributes. Options:

  • true: Clear hard example attributes. (Default value)

  • false: Do not clear hard example attributes.

export_dataset_version_format

String

Format of the dataset version to which data is exported.

export_dataset_version_name

String

Name of the dataset version to which data is exported.

export_dest

String

Dataset export type. The options are as follows:

  • DIR: Data is exported to OBS (default value).

  • NEW_DATASET: Export data to a new dataset.

export_new_dataset_name

String

Name of the new dataset to which data is exported.

export_new_dataset_work_path

String

Working directory of the new dataset to which data is exported.

ratio_sample_usage

Boolean

Whether to randomly allocate the training set and validation set based on the specified ratio. Options:

  • true: Allocate the training set and validation set.

  • false: Do not allocate the training set and validation set. (Default value)

sample_state

String

Sample status. The options are as follows:

  • __ALL__: labeled

  • __NONE__: not marked

  • __UNCHECK__: to be accepted

  • __ACCEPTED__: The acceptance is passed.

  • __REJECTED__: rejected

  • __UNREVIEWED__: to be reviewed

  • __REVIEWED__: approved

  • __WORKFORCE_SAMPLED__: sampled

  • __WORKFORCE_SAMPLED_UNCHECK__: Sampling is to be accepted.

  • __WORKFORCE_SAMPLED_CHECKED__: Sampling has been accepted.

  • __WORKFORCE_SAMPLED_ACCEPTED__: The sampling is passed.

  • __WORKFORCE_SAMPLED_REJECTED__: The sampling has been rejected.

  • __AUTO_ANNOTATION__: to be confirmed

samples

Array of strings

ID list of exported samples.

search_conditions

Array of SearchCondition objects

Exported search conditions. The relationship between multiple search conditions is OR.

train_sample_ratio

String

Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets.

Table 10 SearchCondition

Parameter

Type

Description

coefficient

String

Filter by coefficient of difficulty.

frame_in_video

Integer

A frame in the video.

hard

String

Whether a sample is a hard sample. Options:

  • 0: non-hard sample

  • 1: hard sample

import_origin

String

Filter by data source.

kvp

String

CT dosage, filtered by dosage.

label_list

SearchLabels object

Label search criteria.

labeler

String

Labeler.

metadata

SearchProp object

Search by sample attribute.

parent_sample_id

String

Parent sample ID.

sample_dir

String

Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported.

sample_name

String

Search by sample name, including the file name extension.

sample_time

String

When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. Options:

  • month: Search for samples added from 30 days ago to the current day.

  • day: Search for samples added from yesterday (one day ago) to the current day.

  • yyyyMMdd-yyyyMMdd: Search for samples added in a specified period (at most 30 days), in the format of Start date-End date. For example, 20190901-2019091501 indicates that samples generated from September 1 to September 15, 2019 are searched.

score

String

Search by confidence.

slice_thickness

String

DICOM layer thickness. Samples are filtered by layer thickness.

study_date

String

DICOM scanning time.

time_in_video

String

A time point in the video.

Table 11 SearchLabels

Parameter

Type

Description

labels

Array of SearchLabel objects

List of label search criteria.

op

String

If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. Options:

  • OR: OR operation

  • AND: AND operation

Table 12 SearchLabel

Parameter

Type

Description

name

String

Label name.

op

String

Operation type between multiple attributes. Options:

  • OR: OR operation

  • AND: AND operation

property

Map<String,Array<String>>

Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list.

type

Integer

Label type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video labeling

Table 13 SearchProp

Parameter

Type

Description

op

String

Relationship between attribute values. Options:

  • AND: AND relationship

  • OR: OR relationship

props

Map<String,Array<String>>

Search criteria of an attribute. Multiple search criteria can be set.

Example Requests

  • Creating an Export Task (Exporting Data to OBS)

    {
      "path" : "/test-obs/daoChu/",
      "export_type" : 3,
      "export_params" : {
        "sample_state" : "",
        "export_dest" : "DIR"
      }
    }
  • Creating an Export Task (Exporting Data to a New Dataset)

    {
      "path" : "/test-obs/classify/input/",
      "export_type" : 3,
      "export_params" : {
        "sample_state" : "",
        "export_dest" : "NEW_DATASET",
        "export_new_dataset_name" : "dataset-export-test",
        "export_new_dataset_work_path" : "/test-obs/classify/output/"
      }
    }

Example Responses

Status code: 200

OK

{
  "task_id" : "rF9NNoB56k5rtYKg2Y7"
}

Status Codes

Status Code

Description

200

OK

401

Unauthorized

403

Forbidden

404

Not Found

Error Codes

See Error Codes.