Updated on 2024-05-30 GMT+08:00

Querying the Dataset List

Function

This API is used to query the created datasets that meet the search criteria by page.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

GET /v2/{project_id}/datasets

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

Table 2 Query Parameters

Parameter

Mandatory

Type

Description

check_running_task

No

Boolean

Whether to detect tasks (including initialization tasks) that are running in a dataset. Options:

  • true: Detect tasks (including initialization tasks) that are running in the dataset.

  • false: Do not detect tasks (including initialization tasks) that are running in the dataset. (Default value)

contain_versions

No

Boolean

Whether the dataset contains a version.

dataset_type

No

Integer

Dataset type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 400: table dataset

  • 600: video labeling

  • 900: custom format

file_preview

No

Boolean

Whether a dataset supports preview when it is queried. Options:

  • true: Preview is supported and the list of four dataset files is returned.

  • false: Preview is not supported. (Default value)

limit

No

Integer

Maximum number of records returned on each page. The value ranges from 1 to 100. The default value is 10.

offset

No

Integer

Start page of the paging list. The default value is 0.

order

No

String

Sorting sequence of the query. Options:

  • asc: ascending order

  • desc: descending order (default value)

running_task_type

No

Integer

Type of the running tasks (including initialization tasks) to be detected. The options are as follows:

  • 0: auto labeling

  • 1: pre-labeling

  • 2: export

  • 3: version switch

  • 4: manifest file export

  • 5: manifest file import

  • 6: version publishing

  • 7: auto grouping

search_content

No

String

Fuzzy search keyword. By default, this parameter is left blank.

sort_by

No

String

Sorting mode of the query. Options:

  • create_time: Sort by creation time. (Default value)

  • dataset_name: Sort by dataset name.

support_export

No

Boolean

Whether to filter datasets that can be exported only (including datasets of image classification, object detection, and custom format). If this parameter is left blank or the value is set to false, datasets are not filtered. Options:

  • true: Filter out only datasets that can be exported.

  • false: Do not filter out only datasets that can be exported. (Default value)

train_evaluate_ratio

No

String

Version split ratio for dataset filtering. The numbers before and after the comma indicate the minimum and maximum split ratios, and the versions whose split ratios are within the range are filtered out, for example, 0.0,1.0. Note: If this parameter is left blank or unavailable, the system does not filter datasets based on the version split ratio by default.

version_format

No

Integer

Dataset version format for dataset filtering. This parameter is used to filter datasets that meet the filter criteria. Options:

  • 0: default format

  • 1: CarbonData (supported only by table datasets)

  • 2: CSV

with_labels

No

Boolean

Whether to return dataset labels. Options:

  • true: Return label information.

  • false: Do not return label information. (Default value)

workspace_id

No

String

Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value.

dataset_version

No

String

Dataset version, which is used to distinguish a dataset before and after it is decoupled from labeling tasks. Options:

  • v1: dataset before it is decoupled from labeling tasks (default value)

  • v2: dataset after it is decoupled from labeling tasks (default value)

  • all: all datasets

Request Parameters

None

Response Parameters

Status code: 200

Table 3 Response body parameters

Parameter

Type

Description

datasets

Array of DatasetAndFilePreview objects

Dataset list queried by page.

total_number

Integer

Total number of datasets. The value cannot exceed 100.

Table 4 DatasetAndFilePreview

Parameter

Type

Description

annotated_sample_count

Integer

Number of labeled samples in a dataset.

annotated_sub_sample_count

Integer

Number of labeled subsamples.

content_labeling

Boolean

Whether to enable content labeling for the speech paragraph labeling dataset. This function is enabled by default.

create_time

Long

Time when a dataset is created.

current_version_id

String

Current version ID of a dataset.

current_version_name

String

Current version name of a dataset. Version name. The value is a string of 1 to 32 characters consisting of letters, digits, underscores (_), and hyphens (-).

data_format

String

Data format.

data_sources

Array of DataSource objects

Data source list.

data_statistics

Map<String,Object>

Sample statistics on a dataset, including the statistics on sample metadata in JSON format.

data_update_time

Long

Time when a sample and a label are updated.

data_url

String

Data path for training.

dataset_format

Integer

Dataset format. Options:

  • 0: file

  • 1: table

dataset_id

String

Dataset ID.

dataset_name

String

Dataset name.

dataset_tags

Array of strings

Key identifier list of a dataset, for example, ["Image","Object detection"].

dataset_type

Integer

Dataset type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 400: table dataset

  • 600: video labeling

  • 900: custom format

dataset_version_count

Integer

Version number of a dataset.

deleted_sample_count

Integer

Number of deleted samples.

deletion_stats

Map<String,Integer>

Deletion reason statistics.

description

String

Dataset description.

enterprise_project_id

String

Enterprise project ID.

exist_running_task

Boolean

Whether the dataset contains running (including initialization) tasks. Options:

  • true: The dataset contains running tasks.

  • false: The dataset does not contain running tasks.

exist_workforce_task

Boolean

Whether the dataset contains team labeling tasks. Options:

  • true: The dataset contains team labeling tasks.

  • false: The dataset does not contain team labeling tasks.

feature_supports

Array of strings

List of features supported by the dataset. Currently, only the value 0 is supported, indicating that the OBS file size is limited.

import_data

Boolean

Whether to import data. Options:

  • true: Import data.

  • false: Do not import data.

import_task_id

String

ID of an import task.

inner_annotation_path

String

Path for storing the labeling result of a dataset.

inner_data_path

String

Path for storing the internal data of a dataset.

inner_log_path

String

Path for storing internal logs of a dataset.

inner_task_path

String

Path for internal task of a dataset.

inner_temp_path

String

Path for storing internal temporary files of a dataset.

inner_work_path

String

Output directory of a dataset.

label_task_count

Integer

Number of labeling tasks.

labels

Array of Label objects

Dataset label list.

loading_sample_count

Integer

Number of loading samples.

managed

Boolean

Whether a dataset is hosted. Options:

  • true: The dataset is hosted.

  • false: The dataset is not hosted.

next_version_num

Integer

Number of next versions of a dataset.

running_tasks_id

Array of strings

ID list of running (including initialization) tasks.

samples

Array of AnnotationFile objects

Sample list.

schema

Array of Field objects

Schema list.

status

Integer

Dataset status. Options:

  • 0: creating dataset

  • 1: normal dataset

  • 2: deleting dataset

  • 3: deleted dataset

  • 4: abnormal dataset

  • 5: synchronizing dataset

  • 6: releasing dataset

  • 7: dataset in version switching

  • 8: importing dataset

third_path

String

Third-party path.

total_sample_count

Integer

Total number of dataset samples.

total_sub_sample_count

Integer

Total number of subsamples generated from the parent samples. For example, the total number of key frame images extracted from the video labeling dataset is that of subsamples.

unconfirmed_sample_count

Integer

Number of auto labeling samples to be confirmed.

update_time

Long

Time when a dataset is updated.

versions

Array of DatasetVersion objects

Dataset version information. Currently, only the current version information of a dataset is recorded.

work_path

String

Output dataset path, which is used to store output files such as label files. The path is an OBS path in the format of /Bucket name/File path. For example: /obs-bucket.

work_path_type

Integer

Type of the dataset output path. The default value is 0, indicating an OBS bucket.

workforce_descriptor

WorkforceDescriptor object

Team labeling information.

workforce_task_count

Integer

Number of team labeling tasks of a dataset.

workspace_id

String

Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value.

Table 5 DataSource

Parameter

Type

Description

data_path

String

Data source path.

data_type

Integer

Data type. Options:

  • 0: OBS bucket (default value)

  • 1: GaussDB(DWS)

  • 2: DLI

  • 3: RDS

  • 4: MRS

  • 5: AI Gallery

  • 6: Inference service

schema_maps

Array of SchemaMap objects

Schema mapping information corresponding to the table data.

source_info

SourceInfo object

Information required for importing a table data source.

with_column_header

Boolean

Whether the first row in the file is a column name. This field is valid for the table dataset. Options:

  • true: The first row in the file is the column name.

  • false: The first row in the file is not the column name.

Table 6 SchemaMap

Parameter

Type

Description

dest_name

String

Name of the destination column.

src_name

String

Name of the source column.

Table 7 SourceInfo

Parameter

Type

Description

cluster_id

String

MRS cluster ID. You can log in to the MRS console to view the information.

cluster_mode

String

Running mode of an MRS cluster. Options:

  • 0: normal cluster

  • 1: security cluster

cluster_name

String

MRS cluster name You can log in to the MRS console to view the information.

database_name

String

Name of the database to which the table dataset is imported.

input

String

HDFS path of the table data set. For example, /datasets/demo.

ip

String

IP address of your GaussDB(DWS) cluster.

port

String

Port number of your GaussDB(DWS) cluster.

queue_name

String

DLI queue name of a table dataset.

subnet_id

String

Subnet ID of an MRS cluster.

table_name

String

Name of the table to which a table dataset is imported.

user_name

String

Username, which is mandatory for GaussDB(DWS) data.

user_password

String

User password, which is mandatory for GaussDB(DWS) data.

vpc_id

String

ID of the VPC where an MRS cluster resides.

Table 8 Label

Parameter

Type

Description

attributes

Array of LabelAttribute objects

Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included.

name

String

Label name.

property

LabelProperty object

Basic attribute key-value pair of a label, such as color and shortcut keys.

type

Integer

Label type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video labeling

Table 9 AnnotationFile

Parameter

Type

Description

create_time

Long

Time when a sample is created.

dataset_id

String

Dataset ID.

depth

Integer

Number of image sample channels.

file_Name

String

Sample name.

file_id

String

Sample ID.

file_type

String

File type.

height

Integer

Image sample height.

size

Long

Image sample size.

tags

Map<String,String>

Label information of a sample.

url

String

OBS address of the preview sample.

width

Integer

Image sample width.

Table 10 Field

Parameter

Type

Description

description

String

Schema description.

name

String

Schema name.

schema_id

Integer

Schema ID.

type

String

Schema value type.

Table 11 DatasetVersion

Parameter

Type

Description

add_sample_count

Integer

Number of added samples.

analysis_cache_path

String

Cache path for feature analysis.

analysis_status

Integer

Status of a feature analysis task. Options:

  • 0: initialized

  • 1: running

  • 2: completed

  • 3: failed

analysis_task_id

String

ID of a feature analysis task.

annotated_sample_count

Integer

Number of samples with labeled versions.

annotated_sub_sample_count

Integer

Number of labeled subsamples.

clear_hard_property

Boolean

Whether to clear hard example properties during release. Options:

  • true: Clear hard example properties. (Default value)

  • false: Do not clear hard example properties.

code

String

Status code of a preprocessing task such as rotation and cropping.

create_time

Long

Time when a version is created.

crop

Boolean

Whether to crop the image. This field is valid only for the object detection dataset whose labeling box is in the rectangle shape. Options:

  • true: Crop the image.

  • false: Do not crop the image. (Default value)

crop_path

String

Path for storing cropped files.

crop_rotate_cache_path

String

Temporary directory for executing the rotation and cropping task.

data_analysis

Map<String,Object>

Feature analysis result in JSON format.

data_path

String

Path for storing data.

data_statistics

Map<String,Object>

Sample statistics on a dataset, including the statistics on sample metadata in JSON format.

data_validate

Boolean

Whether data is validated by the validation algorithm before release. Options:

  • true: The data has been validated.

  • false: The data has not been validated.

deleted_sample_count

Integer

Number of deleted samples.

deletion_stats

Map<String,Integer>

Deletion reason statistics.

description

String

Description of a version.

export_images

Boolean

Whether to export images to the version output directory during release. Options:

  • true: Export images to the version output directory.

  • false: Do not export images to the version output directory. (Default value)

extract_serial_number

Boolean

Whether to parse the subsample number during release. The field is valid for the healthcare dataset. Options:

  • true: Parse the subsample number.

  • false: Do not parse the subsample number. (Default value)

include_dataset_data

Boolean

Whether to include the source data of a dataset during release. Options:

  • true: The source data of a dataset is included.

  • false: The source data of a dataset is not included.

is_current

Boolean

Whether the current dataset version is used. Options:

  • true: The current dataset version is used.

  • false: The current dataset version is not used.

label_stats

Array of LabelStats objects

Label statistics list of a released version.

label_type

String

Label type of a released version. Options:

  • multi: Multi-label samples are included.

  • single: All samples are single-labeled.

manifest_cache_input_path

String

Input path for the manifest file cache during version release.

manifest_path

String

Path for storing the manifest file with the released version.

message

String

Task information recorded during release (for example, error information).

modified_sample_count

Integer

Number of modified samples.

previous_annotated_sample_count

Integer

Number of labeled samples of parent versions.

previous_total_sample_count

Integer

Total samples of parent versions.

previous_version_id

String

Parent version ID

processor_task_id

String

ID of a preprocessing task such as rotation and cropping.

processor_task_status

Integer

Status of a preprocessing task such as rotation and cropping. The options are as follows:

  • 0: initialized - 1: running

  • 2: completed

  • 3: failed

  • 4: stopped

  • 5: timeout

  • 6: Deletion failed.

  • 7: Failed to stop.

remove_sample_usage

Boolean

Whether to clear the existing usage information of a dataset during release. Options:

  • true: Clear the existing usage information of a dataset. (Default value)

  • false: Do not clear the existing usage information of a dataset.

rotate

Boolean

Whether to rotate the image. Options:

  • true: Rotate the image.

  • false: Do not rotate the image. (Default value)

rotate_path

String

Path for storing the rotated file.

sample_state

String

Sample status. The options are as follows:

  • __ALL__: labeled

  • __NONE__: not marked

  • __UNCHECK__: to be accepted

  • __ACCEPTED__: The acceptance is passed.

  • __REJECTED__: rejected

  • __UNREVIEWED__: to be reviewed

  • __REVIEWED__: approved

  • __WORKFORCE_SAMPLED__: sampled

  • __WORKFORCE_SAMPLED_UNCHECK__: Sampling is to be accepted.

  • __WORKFORCE_SAMPLED_CHECKED__: Sampling has been accepted.

  • __WORKFORCE_SAMPLED_ACCEPTED__: The sampling is passed.

  • __WORKFORCE_SAMPLED_REJECTED__: The sampling has been rejected.

  • __AUTO_ANNOTATION__: to be confirmed

start_processor_task

Boolean

Whether to start a data analysis task during release. Options:

  • true: Start a data analysis task during release.

  • false: Do not start a data analysis task during release. (Default value)

status

Integer

Status of a dataset version. Options:

  • 0: creating

  • 1: running

  • 2: deleting

  • 3: deleted

  • 4: error

tags

Array of strings

Key identifier list of the dataset. The labeling type is used as the default label when the labeling task releases a version. For example, ["Image","Object detection"].

task_type

Integer

Labeling task type of the released version, which is the same as the dataset type.

total_sample_count

Integer

Total number of version samples.

total_sub_sample_count

Integer

Total number of subsamples generated from the parent samples.

train_evaluate_sample_ratio

String

Split training and verification ratio during version release. The default value is 1.00, indicating that all released versions are training sets.

update_time

Long

Time when a version is updated.

version_format

String

Format of a dataset version. Options:

  • Default: default format

  • CarbonData: CarbonData (supported only by table datasets)

  • CSV: CSV

version_id

String

Dataset version ID.

version_name

String

Dataset version name.

with_column_header

Boolean

Whether the first row in the released CSV file is a column name. This field is valid for the table dataset. Options:

  • true: The first row in the released CSV file is a column name.

  • false: The first row in the released CSV file is not a column name.

Table 12 LabelStats

Parameter

Type

Description

attributes

Array of LabelAttribute objects

Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included.

count

Integer

Number of labels.

name

String

Label name.

property

LabelProperty object

Basic attribute key-value pair of a label, such as color and shortcut keys.

sample_count

Integer

Number of samples containing the label.

type

Integer

Label type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video labeling

Table 13 LabelAttribute

Parameter

Type

Description

default_value

String

Default value of a label attribute.

id

String

Label attribute ID. You can query the tag by invoking the tag list.

name

String

Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'.

type

String

Label attribute type. Options:

  • text: text

  • select: single-choice drop-down list

values

Array of LabelAttributeValue objects

List of label attribute values.

Table 14 LabelAttributeValue

Parameter

Type

Description

id

String

Label attribute value ID.

value

String

Label attribute value.

Table 15 LabelProperty

Parameter

Type

Description

@modelarts:color

String

Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0.

@modelarts:default_shape

String

Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options:

  • bndbox: rectangle

  • polygon: polygon

  • circle: circle

  • line: straight line

  • dashed: dotted line

  • point: point

  • polyline: polyline

@modelarts:from_type

String

Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.

@modelarts:rename_to

String

Default attribute: The new name of the label.

@modelarts:shortcut

String

Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D.

@modelarts:to_type

String

Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.

Table 16 WorkforceDescriptor

Parameter

Type

Description

current_task_id

String

ID of a team labeling task.

current_task_name

String

Name of a team labeling task.

reject_num

Integer

Number of rejected samples.

repetition

Integer

Number of persons who label each sample. The minimum value is 1.

is_synchronize_auto_labeling_data

Boolean

Whether to synchronously update auto labeling data. Options:

  • true: Update auto labeling data synchronously.

  • false: Do not update auto labeling data synchronously.

is_synchronize_data

Boolean

Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options:

  • true: Synchronize updated data to team members.

  • false: Do not synchronize updated data to team members.

workers

Array of Worker objects

List of labeling team members.

workforce_id

String

ID of a labeling team.

workforce_name

String

Name of a labeling team.

Table 17 Worker

Parameter

Type

Description

create_time

Long

Creation time.

description

String

Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"'

email

String

Email address of a labeling team member.

role

Integer

Role. Options:

  • 0: labeling personnel

  • 1: reviewer

  • 2: team administrator

  • 3: dataset owner

status

Integer

Current login status of a labeling team member. Options:

  • 0: The invitation email has not been sent.

  • 1: The invitation email has been sent but the user has not logged in.

  • 2: The user has logged in.

  • 3: The labeling team member has been deleted.

update_time

Long

Update time.

worker_id

String

ID of a labeling team member.

workforce_id

String

ID of a labeling team.

Example Requests

Querying the Dataset List

GET https://{endpoint}/v2/{project_id}/datasets?offset=0&limit=10&sort_by=create_time&order=desc&dataset_type=0&file_preview=true

Example Responses

Status code: 200

OK

{
  "total_number" : 1,
  "datasets" : [ {
    "dataset_id" : "gfghHSokody6AJigS5A",
    "dataset_name" : "dataset-f9e8",
    "dataset_type" : 0,
    "data_format" : "Default",
    "next_version_num" : 4,
    "status" : 1,
    "data_sources" : [ {
      "data_type" : 0,
      "data_path" : "/test-obs/classify/input/animals/"
    } ],
    "create_time" : 1605690595404,
    "update_time" : 1605690595404,
    "description" : "",
    "current_version_id" : "54IXbeJhfttGpL46lbv",
    "current_version_name" : "V003",
    "total_sample_count" : 10,
    "annotated_sample_count" : 10,
    "work_path" : "/test-obs/classify/output/",
    "inner_work_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/",
    "inner_annotation_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/",
    "inner_data_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/data/",
    "inner_log_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/logs/",
    "inner_temp_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/temp/",
    "inner_task_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/task/",
    "work_path_type" : 0,
    "workspace_id" : "0",
    "enterprise_project_id" : "0",
    "exist_running_task" : false,
    "exist_workforce_task" : false,
    "running_tasks_id" : [ ],
    "workforce_task_count" : 0,
    "feature_supports" : [ "0" ],
    "managed" : false,
    "import_data" : false,
    "label_task_count" : 1,
    "dataset_format" : 0,
    "content_labeling" : true,
    "samples" : [ {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/animals/15.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=tuUo9jl6lqoMKAwNBz5g8dxO%2FdE%3D",
      "create_time" : 1605690596035
    }, {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/animals/8.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=NITOdBnkUXtdnKuEgDzZpkQzNfM%3D",
      "create_time" : 1605690596046
    }, {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/animals/9.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=%2BwUo1BL38%2F2d7p7anPi4fNzm1VU%3D",
      "create_time" : 1605690596050
    }, {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/animals/7.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=tOrHfcWo%2FEJ0wRzfi1M5Wk2MrXg%3D",
      "create_time" : 1605690596043
    } ]
  } ]
}

Status Codes

Status Code

Description

200

OK

401

Unauthorized

403

Forbidden

404

Not Found

Error Codes

See Error Codes.