Querying the Dataset List
Function
This API is used to query the created datasets that meet the search criteria by page.
Debugging
You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.
URI
GET /v2/{project_id}/datasets
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
check_running_task |
No |
Boolean |
Whether to detect tasks (including initialization tasks) that are running in a dataset. Options:
|
contain_versions |
No |
Boolean |
Whether the dataset contains a version. |
dataset_type |
No |
Integer |
Dataset type. Options:
|
file_preview |
No |
Boolean |
Whether a dataset supports preview when it is queried. Options:
|
limit |
No |
Integer |
Maximum number of records returned on each page. The value ranges from 1 to 100. The default value is 10. |
offset |
No |
Integer |
Start page of the paging list. The default value is 0. |
order |
No |
String |
Sorting sequence of the query. Options:
|
running_task_type |
No |
Integer |
Type of the running tasks (including initialization tasks) to be detected. The options are as follows:
|
search_content |
No |
String |
Fuzzy search keyword. By default, this parameter is left blank. |
sort_by |
No |
String |
Sorting mode of the query. Options:
|
support_export |
No |
Boolean |
Whether to filter datasets that can be exported only (including datasets of image classification, object detection, and custom format). If this parameter is left blank or the value is set to false, datasets are not filtered. Options:
|
train_evaluate_ratio |
No |
String |
Version split ratio for dataset filtering. The numbers before and after the comma indicate the minimum and maximum split ratios, and the versions whose split ratios are within the range are filtered out, for example, 0.0,1.0. Note: If this parameter is left blank or unavailable, the system does not filter datasets based on the version split ratio by default. |
version_format |
No |
Integer |
Dataset version format for dataset filtering. This parameter is used to filter datasets that meet the filter criteria. Options:
|
with_labels |
No |
Boolean |
Whether to return dataset labels. Options:
|
workspace_id |
No |
String |
Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
dataset_version |
No |
String |
Dataset version, which is used to distinguish a dataset before and after it is decoupled from labeling tasks. Options:
|
Request Parameters
None
Response Parameters
Status code: 200
Parameter |
Type |
Description |
---|---|---|
datasets |
Array of DatasetAndFilePreview objects |
Dataset list queried by page. |
total_number |
Integer |
Total number of datasets. The value cannot exceed 100. |
Parameter |
Type |
Description |
---|---|---|
annotated_sample_count |
Integer |
Number of labeled samples in a dataset. |
annotated_sub_sample_count |
Integer |
Number of labeled subsamples. |
content_labeling |
Boolean |
Whether to enable content labeling for the speech paragraph labeling dataset. This function is enabled by default. |
create_time |
Long |
Time when a dataset is created. |
current_version_id |
String |
Current version ID of a dataset. |
current_version_name |
String |
Current version name of a dataset. Version name. The value is a string of 1 to 32 characters consisting of letters, digits, underscores (_), and hyphens (-). |
data_format |
String |
Data format. |
data_sources |
Array of DataSource objects |
Data source list. |
data_statistics |
Map<String,Object> |
Sample statistics on a dataset, including the statistics on sample metadata in JSON format. |
data_update_time |
Long |
Time when a sample and a label are updated. |
data_url |
String |
Data path for training. |
dataset_format |
Integer |
Dataset format. Options:
|
dataset_id |
String |
Dataset ID. |
dataset_name |
String |
Dataset name. |
dataset_tags |
Array of strings |
Key identifier list of a dataset, for example, ["Image","Object detection"]. |
dataset_type |
Integer |
Dataset type. Options:
|
dataset_version_count |
Integer |
Version number of a dataset. |
deleted_sample_count |
Integer |
Number of deleted samples. |
deletion_stats |
Map<String,Integer> |
Deletion reason statistics. |
description |
String |
Dataset description. |
enterprise_project_id |
String |
Enterprise project ID. |
exist_running_task |
Boolean |
Whether the dataset contains running (including initialization) tasks. Options:
|
exist_workforce_task |
Boolean |
Whether the dataset contains team labeling tasks. Options:
|
feature_supports |
Array of strings |
List of features supported by the dataset. Currently, only the value 0 is supported, indicating that the OBS file size is limited. |
import_data |
Boolean |
Whether to import data. Options:
|
import_task_id |
String |
ID of an import task. |
inner_annotation_path |
String |
Path for storing the labeling result of a dataset. |
inner_data_path |
String |
Path for storing the internal data of a dataset. |
inner_log_path |
String |
Path for storing internal logs of a dataset. |
inner_task_path |
String |
Path for internal task of a dataset. |
inner_temp_path |
String |
Path for storing internal temporary files of a dataset. |
inner_work_path |
String |
Output directory of a dataset. |
label_task_count |
Integer |
Number of labeling tasks. |
labels |
Array of Label objects |
Dataset label list. |
loading_sample_count |
Integer |
Number of loading samples. |
managed |
Boolean |
Whether a dataset is hosted. Options:
|
next_version_num |
Integer |
Number of next versions of a dataset. |
running_tasks_id |
Array of strings |
ID list of running (including initialization) tasks. |
samples |
Array of AnnotationFile objects |
Sample list. |
schema |
Array of Field objects |
Schema list. |
status |
Integer |
Dataset status. Options:
|
third_path |
String |
Third-party path. |
total_sample_count |
Integer |
Total number of dataset samples. |
total_sub_sample_count |
Integer |
Total number of subsamples generated from the parent samples. For example, the total number of key frame images extracted from the video labeling dataset is that of subsamples. |
unconfirmed_sample_count |
Integer |
Number of auto labeling samples to be confirmed. |
update_time |
Long |
Time when a dataset is updated. |
versions |
Array of DatasetVersion objects |
Dataset version information. Currently, only the current version information of a dataset is recorded. |
work_path |
String |
Output dataset path, which is used to store output files such as label files. The path is an OBS path in the format of /Bucket name/File path. For example: /obs-bucket. |
work_path_type |
Integer |
Type of the dataset output path. The default value is 0, indicating an OBS bucket. |
workforce_descriptor |
WorkforceDescriptor object |
Team labeling information. |
workforce_task_count |
Integer |
Number of team labeling tasks of a dataset. |
workspace_id |
String |
Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Parameter |
Type |
Description |
---|---|---|
data_path |
String |
Data source path. |
data_type |
Integer |
Data type. Options:
|
schema_maps |
Array of SchemaMap objects |
Schema mapping information corresponding to the table data. |
source_info |
SourceInfo object |
Information required for importing a table data source. |
with_column_header |
Boolean |
Whether the first row in the file is a column name. This field is valid for the table dataset. Options:
|
Parameter |
Type |
Description |
---|---|---|
dest_name |
String |
Name of the destination column. |
src_name |
String |
Name of the source column. |
Parameter |
Type |
Description |
---|---|---|
cluster_id |
String |
MRS cluster ID. You can log in to the MRS console to view the information. |
cluster_mode |
String |
Running mode of an MRS cluster. Options:
|
cluster_name |
String |
MRS cluster name You can log in to the MRS console to view the information. |
database_name |
String |
Name of the database to which the table dataset is imported. |
input |
String |
HDFS path of the table data set. For example, /datasets/demo. |
ip |
String |
IP address of your GaussDB(DWS) cluster. |
port |
String |
Port number of your GaussDB(DWS) cluster. |
queue_name |
String |
DLI queue name of a table dataset. |
subnet_id |
String |
Subnet ID of an MRS cluster. |
table_name |
String |
Name of the table to which a table dataset is imported. |
user_name |
String |
Username, which is mandatory for GaussDB(DWS) data. |
user_password |
String |
User password, which is mandatory for GaussDB(DWS) data. |
vpc_id |
String |
ID of the VPC where an MRS cluster resides. |
Parameter |
Type |
Description |
---|---|---|
attributes |
Array of LabelAttribute objects |
Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
name |
String |
Label name. |
property |
LabelProperty object |
Basic attribute key-value pair of a label, such as color and shortcut keys. |
type |
Integer |
Label type. Options:
|
Parameter |
Type |
Description |
---|---|---|
create_time |
Long |
Time when a sample is created. |
dataset_id |
String |
Dataset ID. |
depth |
Integer |
Number of image sample channels. |
file_Name |
String |
Sample name. |
file_id |
String |
Sample ID. |
file_type |
String |
File type. |
height |
Integer |
Image sample height. |
size |
Long |
Image sample size. |
tags |
Map<String,String> |
Label information of a sample. |
url |
String |
OBS address of the preview sample. |
width |
Integer |
Image sample width. |
Parameter |
Type |
Description |
---|---|---|
description |
String |
Schema description. |
name |
String |
Schema name. |
schema_id |
Integer |
Schema ID. |
type |
String |
Schema value type. |
Parameter |
Type |
Description |
---|---|---|
add_sample_count |
Integer |
Number of added samples. |
analysis_cache_path |
String |
Cache path for feature analysis. |
analysis_status |
Integer |
Status of a feature analysis task. Options:
|
analysis_task_id |
String |
ID of a feature analysis task. |
annotated_sample_count |
Integer |
Number of samples with labeled versions. |
annotated_sub_sample_count |
Integer |
Number of labeled subsamples. |
clear_hard_property |
Boolean |
Whether to clear hard example properties during release. Options:
|
code |
String |
Status code of a preprocessing task such as rotation and cropping. |
create_time |
Long |
Time when a version is created. |
crop |
Boolean |
Whether to crop the image. This field is valid only for the object detection dataset whose labeling box is in the rectangle shape. Options:
|
crop_path |
String |
Path for storing cropped files. |
crop_rotate_cache_path |
String |
Temporary directory for executing the rotation and cropping task. |
data_analysis |
Map<String,Object> |
Feature analysis result in JSON format. |
data_path |
String |
Path for storing data. |
data_statistics |
Map<String,Object> |
Sample statistics on a dataset, including the statistics on sample metadata in JSON format. |
data_validate |
Boolean |
Whether data is validated by the validation algorithm before release. Options:
|
deleted_sample_count |
Integer |
Number of deleted samples. |
deletion_stats |
Map<String,Integer> |
Deletion reason statistics. |
description |
String |
Description of a version. |
export_images |
Boolean |
Whether to export images to the version output directory during release. Options:
|
extract_serial_number |
Boolean |
Whether to parse the subsample number during release. The field is valid for the healthcare dataset. Options:
|
include_dataset_data |
Boolean |
Whether to include the source data of a dataset during release. Options:
|
is_current |
Boolean |
Whether the current dataset version is used. Options:
|
label_stats |
Array of LabelStats objects |
Label statistics list of a released version. |
label_type |
String |
Label type of a released version. Options:
|
manifest_cache_input_path |
String |
Input path for the manifest file cache during version release. |
manifest_path |
String |
Path for storing the manifest file with the released version. |
message |
String |
Task information recorded during release (for example, error information). |
modified_sample_count |
Integer |
Number of modified samples. |
previous_annotated_sample_count |
Integer |
Number of labeled samples of parent versions. |
previous_total_sample_count |
Integer |
Total samples of parent versions. |
previous_version_id |
String |
Parent version ID |
processor_task_id |
String |
ID of a preprocessing task such as rotation and cropping. |
processor_task_status |
Integer |
Status of a preprocessing task such as rotation and cropping. The options are as follows:
|
remove_sample_usage |
Boolean |
Whether to clear the existing usage information of a dataset during release. Options:
|
rotate |
Boolean |
Whether to rotate the image. Options:
|
rotate_path |
String |
Path for storing the rotated file. |
sample_state |
String |
Sample status. The options are as follows:
|
start_processor_task |
Boolean |
Whether to start a data analysis task during release. Options:
|
status |
Integer |
Status of a dataset version. Options:
|
tags |
Array of strings |
Key identifier list of the dataset. The labeling type is used as the default label when the labeling task releases a version. For example, ["Image","Object detection"]. |
task_type |
Integer |
Labeling task type of the released version, which is the same as the dataset type. |
total_sample_count |
Integer |
Total number of version samples. |
total_sub_sample_count |
Integer |
Total number of subsamples generated from the parent samples. |
train_evaluate_sample_ratio |
String |
Split training and verification ratio during version release. The default value is 1.00, indicating that all released versions are training sets. |
update_time |
Long |
Time when a version is updated. |
version_format |
String |
Format of a dataset version. Options:
|
version_id |
String |
Dataset version ID. |
version_name |
String |
Dataset version name. |
with_column_header |
Boolean |
Whether the first row in the released CSV file is a column name. This field is valid for the table dataset. Options:
|
Parameter |
Type |
Description |
---|---|---|
attributes |
Array of LabelAttribute objects |
Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
count |
Integer |
Number of labels. |
name |
String |
Label name. |
property |
LabelProperty object |
Basic attribute key-value pair of a label, such as color and shortcut keys. |
sample_count |
Integer |
Number of samples containing the label. |
type |
Integer |
Label type. Options:
|
Parameter |
Type |
Description |
---|---|---|
default_value |
String |
Default value of a label attribute. |
id |
String |
Label attribute ID. You can query the tag by invoking the tag list. |
name |
String |
Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'. |
type |
String |
Label attribute type. Options:
|
values |
Array of LabelAttributeValue objects |
List of label attribute values. |
Parameter |
Type |
Description |
---|---|---|
id |
String |
Label attribute value ID. |
value |
String |
Label attribute value. |
Parameter |
Type |
Description |
---|---|---|
@modelarts:color |
String |
Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
@modelarts:default_shape |
String |
Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options:
|
@modelarts:from_type |
String |
Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
@modelarts:rename_to |
String |
Default attribute: The new name of the label. |
@modelarts:shortcut |
String |
Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
@modelarts:to_type |
String |
Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
Parameter |
Type |
Description |
---|---|---|
current_task_id |
String |
ID of a team labeling task. |
current_task_name |
String |
Name of a team labeling task. |
reject_num |
Integer |
Number of rejected samples. |
repetition |
Integer |
Number of persons who label each sample. The minimum value is 1. |
is_synchronize_auto_labeling_data |
Boolean |
Whether to synchronously update auto labeling data. Options:
|
is_synchronize_data |
Boolean |
Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options:
|
workers |
Array of Worker objects |
List of labeling team members. |
workforce_id |
String |
ID of a labeling team. |
workforce_name |
String |
Name of a labeling team. |
Parameter |
Type |
Description |
---|---|---|
create_time |
Long |
Creation time. |
description |
String |
Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
|
String |
Email address of a labeling team member. |
role |
Integer |
Role. Options:
|
status |
Integer |
Current login status of a labeling team member. Options:
|
update_time |
Long |
Update time. |
worker_id |
String |
ID of a labeling team member. |
workforce_id |
String |
ID of a labeling team. |
Example Requests
Querying the Dataset List
GET https://{endpoint}/v2/{project_id}/datasets?offset=0&limit=10&sort_by=create_time&order=desc&dataset_type=0&file_preview=true
Example Responses
Status code: 200
OK
{ "total_number" : 1, "datasets" : [ { "dataset_id" : "gfghHSokody6AJigS5A", "dataset_name" : "dataset-f9e8", "dataset_type" : 0, "data_format" : "Default", "next_version_num" : 4, "status" : 1, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/classify/input/animals/" } ], "create_time" : 1605690595404, "update_time" : 1605690595404, "description" : "", "current_version_id" : "54IXbeJhfttGpL46lbv", "current_version_name" : "V003", "total_sample_count" : 10, "annotated_sample_count" : 10, "work_path" : "/test-obs/classify/output/", "inner_work_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/", "inner_annotation_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/", "inner_data_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/data/", "inner_log_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/logs/", "inner_temp_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/temp/", "inner_task_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/task/", "work_path_type" : 0, "workspace_id" : "0", "enterprise_project_id" : "0", "exist_running_task" : false, "exist_workforce_task" : false, "running_tasks_id" : [ ], "workforce_task_count" : 0, "feature_supports" : [ "0" ], "managed" : false, "import_data" : false, "label_task_count" : 1, "dataset_format" : 0, "content_labeling" : true, "samples" : [ { "url" : "https://test-obs.obs.xxx.com:443/classify/input/animals/15.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=tuUo9jl6lqoMKAwNBz5g8dxO%2FdE%3D", "create_time" : 1605690596035 }, { "url" : "https://test-obs.obs.xxx.com:443/classify/input/animals/8.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=NITOdBnkUXtdnKuEgDzZpkQzNfM%3D", "create_time" : 1605690596046 }, { "url" : "https://test-obs.obs.xxx.com:443/classify/input/animals/9.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=%2BwUo1BL38%2F2d7p7anPi4fNzm1VU%3D", "create_time" : 1605690596050 }, { "url" : "https://test-obs.obs.xxx.com:443/classify/input/animals/7.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=tOrHfcWo%2FEJ0wRzfi1M5Wk2MrXg%3D", "create_time" : 1605690596043 } ] } ] }
Status Codes
Status Code |
Description |
---|---|
200 |
OK |
401 |
Unauthorized |
403 |
Forbidden |
404 |
Not Found |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot