Creating a Dataset
Function
This API is used to create a dataset.
Debugging
You can debug this API through automatic authentication in or use the SDK sample code generated by API Explorer.
URI
POST /v2/{project_id}/datasets
Parameter | Mandatory | Type | Description |
|---|---|---|---|
project_id | Yes | String | Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name. |
Request Parameters
Parameter | Mandatory | Type | Description |
|---|---|---|---|
data_format | No | String | Data format. Options: |
data_sources | Yes | Array of DataSource objects | Input dataset path, which is used to synchronize source data (such as images, text files, and audio files) in the directory and its subdirectories to the dataset. For a table dataset, this parameter indicates the import directory. The work directory of a table dataset cannot be an OBS path in a KMS-encrypted bucket. Only one data source can be imported at a time. |
dataset_name | Yes | String | Dataset name. The value contains 1 to 100 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed, for example, dataset-9f3b. |
dataset_type | No | Integer | Dataset type. Options: |
description | No | String | Dataset description. The value is empty by default. The description contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
import_annotations | No | Boolean | Indicates whether to automatically import the labeling information in the input directory. Object detection, image classification, and text classification are supported. The options are as follows: |
import_data | No | Boolean | Whether to import data. This parameter is used only for table datasets. Options: |
label_format | No | LabelFormat object | Label format information. This parameter is used only for text datasets. |
labels | No | Array of Label objects | Dataset label list. |
managed | No | Boolean | Whether to host a dataset. Options: |
schema | No | Array of Field objects | Schema list. |
work_path | Yes | String | Output dataset path, which is used to store output files such as label files. |
work_path_type | Yes | Integer | Type of the dataset output path. The default value is 0, indicating an OBS bucket. |
workforce_information | No | WorkforceInformation object | Team labeling information. |
workspace_id | No | String | Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
data_path | No | String | Data source path. |
data_type | No | Integer | Data type. Options: |
schema_maps | No | Array of SchemaMap objects | Schema mapping information corresponding to the table data. |
source_info | No | SourceInfo object | Information required for importing a table data source. |
with_column_header | No | Boolean | Whether the first row in the file is a column name. This field is valid for the table dataset. Options: |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
dest_name | No | String | Name of the destination column. |
src_name | No | String | Name of the source column. |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
cluster_id | No | String | MRS cluster ID. You can log in to the MRS console to view the information. |
cluster_mode | No | String | Running mode of an MRS cluster. Options: |
cluster_name | No | String | MRS cluster name You can log in to the MRS console to view the information. |
database_name | No | String | Name of the database to which the table dataset is imported. |
input | No | String | HDFS path of the table data set. For example, /datasets/demo. |
ip | No | String | IP address of your GaussDB(DWS) cluster. |
port | No | String | Port number of your GaussDB(DWS) cluster. |
queue_name | No | String | DLI queue name of a table dataset. |
subnet_id | No | String | Subnet ID of an MRS cluster. |
table_name | No | String | Name of the table to which a table dataset is imported. |
user_name | No | String | Username, which is mandatory for GaussDB(DWS) data. |
user_password | No | String | User password, which is mandatory for GaussDB(DWS) data. |
vpc_id | No | String | ID of the VPC where an MRS cluster resides. |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
attributes | No | Array of LabelAttribute objects | Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
name | No | String | Label name. |
property | No | LabelProperty object | Basic attribute key-value pair of a label, such as color and shortcut keys. |
type | No | Integer | Label type. Options: |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
default_value | No | String | Default value of a label attribute. |
id | No | String | Label attribute ID. You can query the tag by invoking the tag list. |
name | No | String | Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'. |
type | No | String | Label attribute type. Options: |
values | No | Array of LabelAttributeValue objects | List of label attribute values. |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
id | No | String | Label attribute value ID. |
value | No | String | Label attribute value. |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
@modelarts:color | No | String | Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
@modelarts:default_shape | No | String | Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options: |
@modelarts:from_type | No | String | Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
@modelarts:rename_to | No | String | Default attribute: The new name of the label. |
@modelarts:shortcut | No | String | Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
@modelarts:to_type | No | String | Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
description | No | String | Schema description. |
name | No | String | Schema name. |
schema_id | No | Integer | Schema ID. |
type | No | String | Schema value type. |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
data_sync_type | No | Integer | Synchronization type. Options: |
repetition | No | Integer | Number of persons who label each sample. The minimum value is 1. |
synchronize_auto_labeling_data | No | Boolean | Whether to synchronously update auto labeling data. Options: |
synchronize_data | No | Boolean | Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options: |
task_id | No | String | ID of a team labeling task. |
task_name | Yes | String | Name of a team labeling task. The name contains 1 to 64 characters, including only letters, digits, underscores (_), and hyphens (-). |
workforces_config | No | WorkforcesConfig object | Manpower assignment of a team labeling task. You can delegate the administrator to assign the manpower or do it by yourself. |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
agency | No | String | Administrator |
workforces | No | Array of WorkforceConfig objects | List of teams that execute labeling tasks. |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
workers | No | Array of Worker objects | List of labeling team members. |
workforce_id | No | String | ID of a labeling team. |
workforce_name | No | String | Name of a labeling team. The value contains 0 to 1024 characters and does not support the following special characters: !<>=&"' |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
create_time | No | Long | Creation time. |
description | No | String | Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
No | String | Email address of a labeling team member. | |
role | No | Integer | Role. Options: |
status | No | Integer | Current login status of a labeling team member. Options: |
update_time | No | Long | Update time. |
worker_id | No | String | ID of a labeling team member. |
workforce_id | No | String | ID of a labeling team. |
Response Parameters
Status code: 201
Parameter | Type | Description |
|---|---|---|
dataset_id | String | Dataset ID. |
error_code | String | Error code. |
error_msg | String | Error message. |
import_task_id | String | ID of an import task. |
Example Requests
Creating an Image Classification Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-457f", "dataset_type" : 0, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/classify/input/animals/" } ], "description" : "", "work_path" : "/test-obs/classify/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } } ] }Creating an Object Detection Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-95a6", "dataset_type" : 1, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/detect/input/animals/" } ], "description" : "", "work_path" : "/test-obs/detect/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } } ] }-
{ "workspace_id" : "0", "dataset_name" : "dataset-de83", "dataset_type" : 400, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/table/input/", "with_column_header" : true } ], "description" : "", "work_path" : "/test-obs/table/output/", "work_path_type" : 0, "schema" : [ { "schema_id" : 1, "name" : "150", "type" : "STRING" }, { "schema_id" : 2, "name" : "4", "type" : "STRING" }, { "schema_id" : 3, "name" : "setosa", "type" : "STRING" }, { "schema_id" : 4, "name" : "versicolor", "type" : "STRING" }, { "schema_id" : 5, "name" : "virginica", "type" : "STRING" } ], "import_data" : true }
Example Responses
Status code: 201
Created
{
"dataset_id" : "WxCREuCkBSAlQr9xrde"
} Status Codes
Status Code | Description |
|---|---|
201 | Created |
401 | Unauthorized |
403 | Forbidden |
404 | Not Found |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.

