Querying Details About a Dataset
Function
This API is used to query details about a dataset.
Debugging
You can debug this API through automatic authentication in or use the SDK sample code generated by API Explorer.
URI
GET /v2/{project_id}/datasets/{dataset_id}
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
dataset_id |
Yes |
String |
Dataset ID. |
|
project_id |
Yes |
String |
Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
check_running_task |
No |
Boolean |
Whether to detect tasks (including initialization tasks) that are running in a dataset. Options: |
|
running_task_type |
No |
Integer |
Type of the running tasks (including initialization tasks) to be detected. The options are as follows: |
Request Parameters
None
Response Parameters
Status code: 200
|
Parameter |
Type |
Description |
|---|---|---|
|
annotated_sample_count |
Integer |
Number of labeled samples in a dataset. |
|
annotated_sub_sample_count |
Integer |
Number of labeled subsamples. |
|
content_labeling |
Boolean |
Whether to enable content labeling for the speech paragraph labeling dataset. This function is enabled by default. |
|
create_time |
Long |
Time when a dataset is created. |
|
current_version_id |
String |
Current version ID of a dataset. |
|
current_version_name |
String |
Current version name of a dataset. Version name. The value is a string of 1 to 32 characters consisting of letters, digits, underscores (_), and hyphens (-). |
|
data_format |
String |
Data format. |
|
data_sources |
Array of DataSource objects |
Data source list. |
|
data_statistics |
Map<String,Object> |
Sample statistics on a dataset, including the statistics on sample metadata. |
|
data_update_time |
Long |
Time when a sample and a label are updated. |
|
dataset_format |
Integer |
Dataset format. Options: |
|
dataset_id |
String |
Dataset ID. |
|
dataset_name |
String |
Dataset name. |
|
dataset_tags |
Array of strings |
Key identifier list of a dataset, for example, ["Image","Object detection"]. |
|
dataset_type |
Integer |
Dataset type. Options: |
|
dataset_version_count |
Integer |
Number of dataset versions. |
|
deleted_sample_count |
Integer |
Number of deleted samples. |
|
deletion_stats |
Map<String,Integer> |
Deletion reason statistics. |
|
description |
String |
Dataset description. |
|
enterprise_project_id |
String |
Enterprise project ID. |
|
exist_running_task |
Boolean |
Whether the dataset contains running (including initialization) tasks. Options: |
|
exist_workforce_task |
Boolean |
Whether the dataset contains team labeling tasks. Options: |
|
feature_supports |
Array of strings |
List of features supported by the dataset. Currently, only the value 0 is supported, indicating that the OBS file size is limited. |
|
import_data |
Boolean |
Whether to import data. Options: |
|
import_task_id |
String |
ID of an import task. |
|
inner_annotation_path |
String |
Path for storing the labeling result of a dataset. |
|
inner_data_path |
String |
Path for storing the internal data of a dataset. |
|
inner_log_path |
String |
Path for storing internal logs of a dataset. |
|
inner_task_path |
String |
Path for internal task of a dataset. |
|
inner_temp_path |
String |
Path for storing internal temporary files of a dataset. |
|
inner_work_path |
String |
Output directory of a dataset. |
|
label_task_count |
Integer |
Number of labeling tasks. |
|
labels |
Array of Label objects |
Dataset label list. |
|
loading_sample_count |
Integer |
Number of loading samples. |
|
managed |
Boolean |
Whether a dataset is hosted. Options: |
|
next_version_num |
Integer |
Number of next versions of a dataset. |
|
running_tasks_id |
Array of strings |
ID list of running (including initialization) tasks. |
|
schema |
Array of Field objects |
Schema list. |
|
status |
Integer |
Dataset status. Options: |
|
third_path |
String |
Third-party path. |
|
total_sample_count |
Integer |
Total number of dataset samples. |
|
total_sub_sample_count |
Integer |
Total number of subsamples generated from the parent samples. For example, the total number of key frame images extracted from the video labeling dataset is that of subsamples. |
|
unconfirmed_sample_count |
Integer |
Number of auto labeling samples to be confirmed. |
|
update_time |
Long |
Time when a dataset is updated. |
|
versions |
Array of DatasetVersion objects |
Dataset version information. Currently, only the current version information of a dataset is recorded. |
|
work_path |
String |
Output dataset path, which is used to store output files such as label files. The path is an OBS path in the format of /Bucket name/File path. For example: /obs-bucket. |
|
work_path_type |
Integer |
Type of the dataset output path. The default value is 0, indicating an OBS bucket. |
|
workforce_descriptor |
WorkforceDescriptor object |
Team labeling information. |
|
workforce_task_count |
Integer |
Number of team labeling tasks of a dataset. |
|
workspace_id |
String |
Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
|
Parameter |
Type |
Description |
|---|---|---|
|
data_path |
String |
Data source path. |
|
data_type |
Integer |
Data type. Options: |
|
schema_maps |
Array of SchemaMap objects |
Schema mapping information corresponding to the table data. |
|
source_info |
SourceInfo object |
Information required for importing a table data source. |
|
with_column_header |
Boolean |
Whether the first row in the file is a column name. This field is valid for the table dataset. Options: |
|
Parameter |
Type |
Description |
|---|---|---|
|
dest_name |
String |
Name of the destination column. |
|
src_name |
String |
Name of the source column. |
|
Parameter |
Type |
Description |
|---|---|---|
|
cluster_id |
String |
MRS cluster ID. You can log in to the MRS console to view the information. |
|
cluster_mode |
String |
Running mode of an MRS cluster. Options: |
|
cluster_name |
String |
MRS cluster name You can log in to the MRS console to view the information. |
|
database_name |
String |
Name of the database to which the table dataset is imported. |
|
input |
String |
HDFS path of the table data set. For example, /datasets/demo. |
|
ip |
String |
IP address of your GaussDB(DWS) cluster. |
|
port |
String |
Port number of your GaussDB(DWS) cluster. |
|
queue_name |
String |
DLI queue name of a table dataset. |
|
subnet_id |
String |
Subnet ID of an MRS cluster. |
|
table_name |
String |
Name of the table to which a table dataset is imported. |
|
user_name |
String |
Username, which is mandatory for GaussDB(DWS) data. |
|
user_password |
String |
User password, which is mandatory for GaussDB(DWS) data. |
|
vpc_id |
String |
ID of the VPC where an MRS cluster resides. |
|
Parameter |
Type |
Description |
|---|---|---|
|
attributes |
Array of LabelAttribute objects |
Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
|
name |
String |
Label name. |
|
property |
LabelProperty object |
Basic attribute key-value pair of a label, such as color and shortcut keys. |
|
type |
Integer |
Label type. Options: |
|
Parameter |
Type |
Description |
|---|---|---|
|
description |
String |
Schema description. |
|
name |
String |
Schema name. |
|
schema_id |
Integer |
Schema ID. |
|
type |
String |
Schema value type. |
|
Parameter |
Type |
Description |
|---|---|---|
|
add_sample_count |
Integer |
Number of added samples. |
|
analysis_cache_path |
String |
Cache path for feature analysis. |
|
analysis_status |
Integer |
Status of a feature analysis task. Options: |
|
analysis_task_id |
String |
ID of a feature analysis task. |
|
annotated_sample_count |
Integer |
Number of samples with labeled versions. |
|
annotated_sub_sample_count |
Integer |
Number of labeled subsamples. |
|
clear_hard_property |
Boolean |
Whether to clear hard example properties during release. Options: |
|
code |
String |
Status code of a preprocessing task such as rotation and cropping. |
|
create_time |
Long |
Time when a version is created. |
|
crop |
Boolean |
Whether to crop the image. This field is valid only for the object detection dataset whose labeling box is in the rectangle shape. Options: |
|
crop_path |
String |
Path for storing cropped files. |
|
crop_rotate_cache_path |
String |
Temporary directory for executing the rotation and cropping task. |
|
data_analysis |
Map<String,Object> |
Feature analysis result in JSON format. |
|
data_path |
String |
Path for storing data. |
|
data_statistics |
Map<String,Object> |
Sample statistics on a dataset, including the statistics on sample metadata in JSON format. |
|
data_validate |
Boolean |
Whether data is validated by the validation algorithm before release. Options: |
|
deleted_sample_count |
Integer |
Number of deleted samples. |
|
deletion_stats |
Map<String,Integer> |
Deletion reason statistics. |
|
description |
String |
Description of a version. |
|
export_images |
Boolean |
Whether to export images to the version output directory during release. Options: |
|
extract_serial_number |
Boolean |
Whether to parse the subsample number during release. The field is valid for the healthcare dataset. Options: |
|
include_dataset_data |
Boolean |
Whether to include the source data of a dataset during release. Options: |
|
is_current |
Boolean |
Whether the current dataset version is used. Options: |
|
label_stats |
Array of LabelStats objects |
Label statistics list of a released version. |
|
label_type |
String |
Label type of a released version. Options: |
|
manifest_cache_input_path |
String |
Input path for the manifest file cache during version release. |
|
manifest_path |
String |
Path for storing the manifest file with the released version. |
|
message |
String |
Task information recorded during release (for example, error information). |
|
modified_sample_count |
Integer |
Number of modified samples. |
|
previous_annotated_sample_count |
Integer |
Number of labeled samples of parent versions. |
|
previous_total_sample_count |
Integer |
Total samples of parent versions. |
|
previous_version_id |
String |
Parent version ID |
|
processor_task_id |
String |
ID of a preprocessing task such as rotation and cropping. |
|
processor_task_status |
Integer |
Status of a preprocessing task such as rotation and cropping. The options are as follows: |
|
remove_sample_usage |
Boolean |
Whether to clear the existing usage information of a dataset during release. Options: |
|
rotate |
Boolean |
Whether to rotate the image. Options: |
|
rotate_path |
String |
Path for storing the rotated file. |
|
sample_state |
String |
Sample status. The options are as follows: |
|
start_processor_task |
Boolean |
Whether to start a data analysis task during release. Options: |
|
status |
Integer |
Status of a dataset version. Options: |
|
tags |
Array of strings |
Key identifier list of the dataset. The labeling type is used as the default label when the labeling task releases a version. For example, ["Image","Object detection"]. |
|
task_type |
Integer |
Labeling task type of the released version, which is the same as the dataset type. |
|
total_sample_count |
Integer |
Total number of version samples. |
|
total_sub_sample_count |
Integer |
Total number of subsamples generated from the parent samples. |
|
train_evaluate_sample_ratio |
String |
Split training and verification ratio during version release. The default value is 1.00, indicating that all released versions are training sets. |
|
update_time |
Long |
Time when a version is updated. |
|
version_format |
String |
Format of a dataset version. Options: |
|
version_id |
String |
Dataset version ID. |
|
version_name |
String |
Dataset version name. |
|
with_column_header |
Boolean |
Whether the first row in the released CSV file is a column name. This field is valid for the table dataset. Options: |
|
Parameter |
Type |
Description |
|---|---|---|
|
attributes |
Array of LabelAttribute objects |
Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
|
count |
Integer |
Number of labels. |
|
name |
String |
Label name. |
|
property |
LabelProperty object |
Basic attribute key-value pair of a label, such as color and shortcut keys. |
|
sample_count |
Integer |
Number of samples containing the label. |
|
type |
Integer |
Label type. Options: |
|
Parameter |
Type |
Description |
|---|---|---|
|
default_value |
String |
Default value of a label attribute. |
|
id |
String |
Label attribute ID. You can query the tag by invoking the tag list. |
|
name |
String |
Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'. |
|
type |
String |
Label attribute type. Options: |
|
values |
Array of LabelAttributeValue objects |
List of label attribute values. |
|
Parameter |
Type |
Description |
|---|---|---|
|
id |
String |
Label attribute value ID. |
|
value |
String |
Label attribute value. |
|
Parameter |
Type |
Description |
|---|---|---|
|
@modelarts:color |
String |
Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
|
@modelarts:default_shape |
String |
Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options: |
|
@modelarts:from_type |
String |
Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
|
@modelarts:rename_to |
String |
Default attribute: The new name of the label. |
|
@modelarts:shortcut |
String |
Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
|
@modelarts:to_type |
String |
Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
|
Parameter |
Type |
Description |
|---|---|---|
|
current_task_id |
String |
ID of a team labeling task. |
|
current_task_name |
String |
Name of a team labeling task. |
|
reject_num |
Integer |
Number of rejected samples. |
|
repetition |
Integer |
Number of persons who label each sample. The minimum value is 1. |
|
is_synchronize_auto_labeling_data |
Boolean |
Whether to synchronously update auto labeling data. Options: |
|
is_synchronize_data |
Boolean |
Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options: |
|
workers |
Array of Worker objects |
List of labeling team members. |
|
workforce_id |
String |
ID of a labeling team. |
|
workforce_name |
String |
Name of a labeling team. |
|
Parameter |
Type |
Description |
|---|---|---|
|
create_time |
Long |
Creation time. |
|
description |
String |
Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
|
|
String |
Email address of a labeling team member. |
|
role |
Integer |
Role. Options: |
|
status |
Integer |
Current login status of a labeling team member. Options: |
|
update_time |
Long |
Update time. |
|
worker_id |
String |
ID of a labeling team member. |
|
workforce_id |
String |
ID of a labeling team. |
Example Requests
Querying Details About a Dataset
GET https://{endpoint}/v2/{project_id}/datasets/{dataset_id}
Example Responses
Status code: 200
OK
{
"dataset_id" : "gfghHSokody6AJigS5A",
"dataset_name" : "dataset-f9e8",
"dataset_type" : 0,
"data_format" : "Default",
"next_version_num" : 4,
"status" : 1,
"data_sources" : [ {
"data_type" : 0,
"data_path" : "/test-obs/classify/input/animals/"
} ],
"create_time" : 1605690595404,
"update_time" : 1605690595404,
"description" : "",
"current_version_id" : "54IXbeJhfttGpL46lbv",
"current_version_name" : "V003",
"total_sample_count" : 10,
"annotated_sample_count" : 10,
"unconfirmed_sample_count" : 0,
"work_path" : "/test-obs/classify/output/",
"inner_work_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/",
"inner_annotation_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/",
"inner_data_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/data/",
"inner_log_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/logs/",
"inner_temp_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/temp/",
"inner_task_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/task/",
"work_path_type" : 0,
"workspace_id" : "0",
"enterprise_project_id" : "0",
"workforce_task_count" : 0,
"feature_supports" : [ "0" ],
"managed" : false,
"import_data" : false,
"label_task_count" : 1,
"dataset_format" : 0,
"dataset_version_count" : 3,
"content_labeling" : true,
"labels" : [ {
"name" : "Rabbits",
"type" : 0,
"property" : {
"@modelarts:color" : "#3399ff"
}
}, {
"name" : "Bees",
"type" : 0,
"property" : {
"@modelarts:color" : "#3399ff"
}
} ]
}
Status Codes
|
Status Code |
Description |
|---|---|
|
200 |
OK |
|
401 |
Unauthorized |
|
403 |
Forbidden |
|
404 |
Not Found |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.