Creating a Dataset
Function
This API is used to create a dataset.
Debugging
You can debug this API through automatic authentication in or use the SDK sample code generated by API Explorer.
URI
POST /v2/{project_id}/datasets
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name. |
Request Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
data_format |
No |
String |
Data format. Options: |
data_sources |
Yes |
Array of DataSource objects |
Input dataset path, which is used to synchronize source data (such as images, text files, and audio files) in the directory and its subdirectories to the dataset. For a table dataset, this parameter indicates the import directory. The work directory of a table dataset cannot be an OBS path in a KMS-encrypted bucket. Only one data source can be imported at a time. |
dataset_name |
Yes |
String |
Dataset name. The value contains 1 to 100 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed, for example, dataset-9f3b. |
dataset_type |
No |
Integer |
Dataset type. Options: |
description |
No |
String |
Dataset description. The value is empty by default. The description contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
import_annotations |
No |
Boolean |
Indicates whether to automatically import the labeling information in the input directory. Object detection, image classification, and text classification are supported. The options are as follows: |
import_data |
No |
Boolean |
Whether to import data. This parameter is used only for table datasets. Options: |
label_format |
No |
LabelFormat object |
Label format information. This parameter is used only for text datasets. |
labels |
No |
Array of Label objects |
Dataset label list. |
managed |
No |
Boolean |
Whether to host a dataset. Options: |
schema |
No |
Array of Field objects |
Schema list. |
work_path |
Yes |
String |
Output dataset path, which is used to store output files such as label files. |
work_path_type |
Yes |
Integer |
Type of the dataset output path. The default value is 0, indicating an OBS bucket. |
workforce_information |
No |
WorkforceInformation object |
Team labeling information. |
workspace_id |
No |
String |
Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
data_path |
No |
String |
Data source path. |
data_type |
No |
Integer |
Data type. Options: |
schema_maps |
No |
Array of SchemaMap objects |
Schema mapping information corresponding to the table data. |
source_info |
No |
SourceInfo object |
Information required for importing a table data source. |
with_column_header |
No |
Boolean |
Whether the first row in the file is a column name. This field is valid for the table dataset. Options: |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
dest_name |
No |
String |
Name of the destination column. |
src_name |
No |
String |
Name of the source column. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
cluster_id |
No |
String |
MRS cluster ID. You can log in to the MRS console to view the information. |
cluster_mode |
No |
String |
Running mode of an MRS cluster. Options: |
cluster_name |
No |
String |
MRS cluster name You can log in to the MRS console to view the information. |
database_name |
No |
String |
Name of the database to which the table dataset is imported. |
input |
No |
String |
HDFS path of the table data set. For example, /datasets/demo. |
ip |
No |
String |
IP address of your GaussDB(DWS) cluster. |
port |
No |
String |
Port number of your GaussDB(DWS) cluster. |
queue_name |
No |
String |
DLI queue name of a table dataset. |
subnet_id |
No |
String |
Subnet ID of an MRS cluster. |
table_name |
No |
String |
Name of the table to which a table dataset is imported. |
user_name |
No |
String |
Username, which is mandatory for GaussDB(DWS) data. |
user_password |
No |
String |
User password, which is mandatory for GaussDB(DWS) data. |
vpc_id |
No |
String |
ID of the VPC where an MRS cluster resides. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
attributes |
No |
Array of LabelAttribute objects |
Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
name |
No |
String |
Label name. |
property |
No |
LabelProperty object |
Basic attribute key-value pair of a label, such as color and shortcut keys. |
type |
No |
Integer |
Label type. Options: |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
default_value |
No |
String |
Default value of a label attribute. |
id |
No |
String |
Label attribute ID. You can query the tag by invoking the tag list. |
name |
No |
String |
Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'. |
type |
No |
String |
Label attribute type. Options: |
values |
No |
Array of LabelAttributeValue objects |
List of label attribute values. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
id |
No |
String |
Label attribute value ID. |
value |
No |
String |
Label attribute value. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
@modelarts:color |
No |
String |
Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
@modelarts:default_shape |
No |
String |
Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options: |
@modelarts:from_type |
No |
String |
Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
@modelarts:rename_to |
No |
String |
Default attribute: The new name of the label. |
@modelarts:shortcut |
No |
String |
Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
@modelarts:to_type |
No |
String |
Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
description |
No |
String |
Schema description. |
name |
No |
String |
Schema name. |
schema_id |
No |
Integer |
Schema ID. |
type |
No |
String |
Schema value type. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
data_sync_type |
No |
Integer |
Synchronization type. Options: |
repetition |
No |
Integer |
Number of persons who label each sample. The minimum value is 1. |
synchronize_auto_labeling_data |
No |
Boolean |
Whether to synchronously update auto labeling data. Options: |
synchronize_data |
No |
Boolean |
Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options: |
task_id |
No |
String |
ID of a team labeling task. |
task_name |
Yes |
String |
Name of a team labeling task. The name contains 1 to 64 characters, including only letters, digits, underscores (_), and hyphens (-). |
workforces_config |
No |
WorkforcesConfig object |
Manpower assignment of a team labeling task. You can delegate the administrator to assign the manpower or do it by yourself. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
agency |
No |
String |
Administrator |
workforces |
No |
Array of WorkforceConfig objects |
List of teams that execute labeling tasks. |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
workers |
No |
Array of Worker objects |
List of labeling team members. |
workforce_id |
No |
String |
ID of a labeling team. |
workforce_name |
No |
String |
Name of a labeling team. The value contains 0 to 1024 characters and does not support the following special characters: !<>=&"' |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
create_time |
No |
Long |
Creation time. |
description |
No |
String |
Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
|
No |
String |
Email address of a labeling team member. |
role |
No |
Integer |
Role. Options: |
status |
No |
Integer |
Current login status of a labeling team member. Options: |
update_time |
No |
Long |
Update time. |
worker_id |
No |
String |
ID of a labeling team member. |
workforce_id |
No |
String |
ID of a labeling team. |
Response Parameters
Status code: 201
Parameter |
Type |
Description |
---|---|---|
dataset_id |
String |
Dataset ID. |
error_code |
String |
Error code. |
error_msg |
String |
Error message. |
import_task_id |
String |
ID of an import task. |
Example Requests
-
Creating an Image Classification Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-457f", "dataset_type" : 0, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/classify/input/animals/" } ], "description" : "", "work_path" : "/test-obs/classify/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } } ] }
-
Creating an Object Detection Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-95a6", "dataset_type" : 1, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/detect/input/animals/" } ], "description" : "", "work_path" : "/test-obs/detect/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } } ] }
-
{ "workspace_id" : "0", "dataset_name" : "dataset-de83", "dataset_type" : 400, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/table/input/", "with_column_header" : true } ], "description" : "", "work_path" : "/test-obs/table/output/", "work_path_type" : 0, "schema" : [ { "schema_id" : 1, "name" : "150", "type" : "STRING" }, { "schema_id" : 2, "name" : "4", "type" : "STRING" }, { "schema_id" : 3, "name" : "setosa", "type" : "STRING" }, { "schema_id" : 4, "name" : "versicolor", "type" : "STRING" }, { "schema_id" : 5, "name" : "virginica", "type" : "STRING" } ], "import_data" : true }
Example Responses
Status code: 201
Created
{ "dataset_id" : "WxCREuCkBSAlQr9xrde" }
Status Codes
Status Code |
Description |
---|---|
201 |
Created |
401 |
Unauthorized |
403 |
Forbidden |
404 |
Not Found |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.