Creating a Dataset
Function
This API is used to create a dataset.
Debugging
You can debug this API through automatic authentication in or use the SDK sample code generated by API Explorer.
URI
POST /v2/{project_id}/datasets
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| project_id | Yes | String | Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name. |
Request Parameters
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| data_format | No | String | Data format. Options: |
| data_sources | Yes | Array of DataSource objects | Input dataset path, which is used to synchronize source data (such as images, text files, and audio files) in the directory and its subdirectories to the dataset. For a table dataset, this parameter indicates the import directory. The work directory of a table dataset cannot be an OBS path in a KMS-encrypted bucket. Only one data source can be imported at a time. |
| dataset_name | Yes | String | Dataset name. The value contains 1 to 100 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed, for example, dataset-9f3b. |
| dataset_type | No | Integer | Dataset type. Options: |
| description | No | String | Dataset description. The value is empty by default. The description contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
| import_annotations | No | Boolean | Indicates whether to automatically import the labeling information in the input directory. Object detection, image classification, and text classification are supported. The options are as follows: |
| import_data | No | Boolean | Whether to import data. This parameter is used only for table datasets. Options: |
| label_format | No | LabelFormat object | Label format information. This parameter is used only for text datasets. |
| labels | No | Array of Label objects | Dataset label list. |
| managed | No | Boolean | Whether to host a dataset. Options: |
| schema | No | Array of Field objects | Schema list. |
| work_path | Yes | String | Output dataset path, which is used to store output files such as label files. |
| work_path_type | Yes | Integer | Type of the dataset output path. The default value is 0, indicating an OBS bucket. |
| workforce_information | No | WorkforceInformation object | Team labeling information. |
| workspace_id | No | String | Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| data_path | No | String | Data source path. |
| data_type | No | Integer | Data type. Options: |
| schema_maps | No | Array of SchemaMap objects | Schema mapping information corresponding to the table data. |
| source_info | No | SourceInfo object | Information required for importing a table data source. |
| with_column_header | No | Boolean | Whether the first row in the file is a column name. This field is valid for the table dataset. Options: |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| dest_name | No | String | Name of the destination column. |
| src_name | No | String | Name of the source column. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| cluster_id | No | String | MRS cluster ID. You can log in to the MRS console to view the information. |
| cluster_mode | No | String | Running mode of an MRS cluster. Options: |
| cluster_name | No | String | MRS cluster name You can log in to the MRS console to view the information. |
| database_name | No | String | Name of the database to which the table dataset is imported. |
| input | No | String | HDFS path of the table data set. For example, /datasets/demo. |
| ip | No | String | IP address of your GaussDB(DWS) cluster. |
| port | No | String | Port number of your GaussDB(DWS) cluster. |
| queue_name | No | String | DLI queue name of a table dataset. |
| subnet_id | No | String | Subnet ID of an MRS cluster. |
| table_name | No | String | Name of the table to which a table dataset is imported. |
| user_name | No | String | Username, which is mandatory for GaussDB(DWS) data. |
| user_password | No | String | User password, which is mandatory for GaussDB(DWS) data. |
| vpc_id | No | String | ID of the VPC where an MRS cluster resides. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| attributes | No | Array of LabelAttribute objects | Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
| name | No | String | Label name. |
| property | No | LabelProperty object | Basic attribute key-value pair of a label, such as color and shortcut keys. |
| type | No | Integer | Label type. Options: |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| default_value | No | String | Default value of a label attribute. |
| id | No | String | Label attribute ID. You can query the tag by invoking the tag list. |
| name | No | String | Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'. |
| type | No | String | Label attribute type. Options: |
| values | No | Array of LabelAttributeValue objects | List of label attribute values. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| id | No | String | Label attribute value ID. |
| value | No | String | Label attribute value. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| @modelarts:color | No | String | Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
| @modelarts:default_shape | No | String | Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options: |
| @modelarts:from_type | No | String | Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
| @modelarts:rename_to | No | String | Default attribute: The new name of the label. |
| @modelarts:shortcut | No | String | Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
| @modelarts:to_type | No | String | Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| description | No | String | Schema description. |
| name | No | String | Schema name. |
| schema_id | No | Integer | Schema ID. |
| type | No | String | Schema value type. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| data_sync_type | No | Integer | Synchronization type. Options: |
| repetition | No | Integer | Number of persons who label each sample. The minimum value is 1. |
| synchronize_auto_labeling_data | No | Boolean | Whether to synchronously update auto labeling data. Options: |
| synchronize_data | No | Boolean | Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options: |
| task_id | No | String | ID of a team labeling task. |
| task_name | Yes | String | Name of a team labeling task. The name contains 1 to 64 characters, including only letters, digits, underscores (_), and hyphens (-). |
| workforces_config | No | WorkforcesConfig object | Manpower assignment of a team labeling task. You can delegate the administrator to assign the manpower or do it by yourself. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| agency | No | String | Administrator |
| workforces | No | Array of WorkforceConfig objects | List of teams that execute labeling tasks. |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| workers | No | Array of Worker objects | List of labeling team members. |
| workforce_id | No | String | ID of a labeling team. |
| workforce_name | No | String | Name of a labeling team. The value contains 0 to 1024 characters and does not support the following special characters: !<>=&"' |
| Parameter | Mandatory | Type | Description |
|---|---|---|---|
| create_time | No | Long | Creation time. |
| description | No | String | Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"' |
| | No | String | Email address of a labeling team member. |
| role | No | Integer | Role. Options: |
| status | No | Integer | Current login status of a labeling team member. Options: |
| update_time | No | Long | Update time. |
| worker_id | No | String | ID of a labeling team member. |
| workforce_id | No | String | ID of a labeling team. |
Response Parameters
Status code: 201
| Parameter | Type | Description |
|---|---|---|
| dataset_id | String | Dataset ID. |
| error_code | String | Error code. |
| error_msg | String | Error message. |
| import_task_id | String | ID of an import task. |
Example Requests
-
Creating an Image Classification Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-457f", "dataset_type" : 0, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/classify/input/animals/" } ], "description" : "", "work_path" : "/test-obs/classify/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } } ] } -
Creating an Object Detection Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-95a6", "dataset_type" : 1, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/detect/input/animals/" } ], "description" : "", "work_path" : "/test-obs/detect/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } } ] } -
{ "workspace_id" : "0", "dataset_name" : "dataset-de83", "dataset_type" : 400, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/table/input/", "with_column_header" : true } ], "description" : "", "work_path" : "/test-obs/table/output/", "work_path_type" : 0, "schema" : [ { "schema_id" : 1, "name" : "150", "type" : "STRING" }, { "schema_id" : 2, "name" : "4", "type" : "STRING" }, { "schema_id" : 3, "name" : "setosa", "type" : "STRING" }, { "schema_id" : 4, "name" : "versicolor", "type" : "STRING" }, { "schema_id" : 5, "name" : "virginica", "type" : "STRING" } ], "import_data" : true }
Example Responses
Status code: 201
Created
{
"dataset_id" : "WxCREuCkBSAlQr9xrde"
} Status Codes
| Status Code | Description |
|---|---|
| 201 | Created |
| 401 | Unauthorized |
| 403 | Forbidden |
| 404 | Not Found |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.