Help Center/ ModelArts/ API Reference/ Historical APIs/ Data Management (Old Version)/ Creating a Dataset

Updated on 2025-08-20 GMT+08:00

View PDF

Creating a Dataset

Function

This API is used to create a dataset.

Debugging

You can debug this API through automatic authentication in API Explorer or use the SDK sample code generated by API Explorer.

URI

POST /v2/{project_id}/datasets

**Table 1** Path Parameters
Parameter	Mandatory	Type	Description
project_id	Yes	String	Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

Request Parameters

**Table 2** Request body parameters
Parameter	Mandatory	Type	Description
data_format	No	String	Data format. Options: Default: default format CarbonData: CarbonData (supported only by table datasets)
data_sources	Yes	Array of DataSource objects	Input dataset path, which is used to synchronize source data (such as images, text files, and audio files) in the directory and its subdirectories to the dataset. For a table dataset, this parameter indicates the import directory. The work directory of a table dataset cannot be an OBS path in a KMS-encrypted bucket. Only one data source can be imported at a time.
dataset_name	Yes	String	Dataset name. The value contains 1 to 100 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed, for example, dataset-9f3b.
dataset_type	No	Integer	Dataset type. Options: 0: image classification 1: object detection 3: image segmentation 100: text classification 101: named entity recognition 102: text triplet 200: sound classification 201: speech content 202: speech paragraph labeling 400: table dataset 600: video labeling 900: custom format
description	No	String	Dataset description. The value is empty by default. The description contains 0 to 256 characters and does not support the following special characters: ^!<>=&"'
import_annotations	No	Boolean	Indicates whether to automatically import the labeling information in the input directory. Object detection, image classification, and text classification are supported. The options are as follows: true: Import the annotation information in the input directory (default value). false: The annotation information in the input directory is not imported.
import_data	No	Boolean	Whether to import data. This parameter is used only for table datasets. Options: true: Import data when creating a database. false: Do not import data when creating a database. (Default value)
label_format	No	LabelFormat object	Label format information. This parameter is used only for text datasets.
labels	No	Array of Label objects	Dataset label list.
managed	No	Boolean	Whether to host a dataset. Options: true: Host a dataset. false: Do not host a dataset. (Default value)
schema	No	Array of Field objects	Schema list.
work_path	Yes	String	Output dataset path, which is used to store output files such as label files. The format is /Bucket name/File path, for example, /obs-bucket/flower/rose/. (The directory is used as the path.) A bucket cannot be directly used as a path. The output dataset path is different from the input dataset path or its subdirectory. The value contains 3 to 700 characters.
work_path_type	Yes	Integer	Type of the dataset output path. The default value is 0, indicating an OBS bucket.
workforce_information	No	WorkforceInformation object	Team labeling information.
workspace_id	No	String	Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value.

**Table 3** DataSource
Parameter	Mandatory	Type	Description
data_path	No	String	Data source path.
data_type	No	Integer	Data type. Options: 0: OBS bucket (default value) 1: GaussDB(DWS) 2: DLI 3: RDS 4: MRS 5: AI Gallery 6: Inference service
schema_maps	No	Array of SchemaMap objects	Schema mapping information corresponding to the table data.
source_info	No	SourceInfo object	Information required for importing a table data source.
with_column_header	No	Boolean	Whether the first row in the file is a column name. This field is valid for the table dataset. Options: true: The first row in the file is the column name. false: The first row in the file is not the column name.

**Table 4** SchemaMap
Parameter	Mandatory	Type	Description
dest_name	No	String	Name of the destination column.
src_name	No	String	Name of the source column.

**Table 5** SourceInfo
Parameter	Mandatory	Type	Description
cluster_id	No	String	MRS cluster ID. You can log in to the MRS console to view the information.
cluster_mode	No	String	Running mode of an MRS cluster. Options: 0: normal cluster 1: security cluster
cluster_name	No	String	MRS cluster name You can log in to the MRS console to view the information.
database_name	No	String	Name of the database to which the table dataset is imported.
input	No	String	HDFS path of the table data set. For example, /datasets/demo.
ip	No	String	IP address of your GaussDB(DWS) cluster.
port	No	String	Port number of your GaussDB(DWS) cluster.
queue_name	No	String	DLI queue name of a table dataset.
subnet_id	No	String	Subnet ID of an MRS cluster.
table_name	No	String	Name of the table to which a table dataset is imported.
user_name	No	String	Username, which is mandatory for GaussDB(DWS) data.
user_password	No	String	User password, which is mandatory for GaussDB(DWS) data.
vpc_id	No	String	ID of the VPC where an MRS cluster resides.

**Table 6** LabelFormat
Parameter	Mandatory	Type	Description
label_type	No	String	Label type of text classification. Options: 0: The label is separated from the text, and they are distinguished by the fixed suffix _result. For example, the text file is abc.txt, and the label file is abc_result.txt. 1: Default value. Labels and texts are stored in the same file and separated by separators. You can use text_sample_separator to specify the separator between the text and label and text_label_separator to specify the separator between labels.
text_label_separator	No	String	Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=\|?/':.;,
text_sample_separator	No	String	Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: !@#$%^&*_=\|?/':.;,

**Table 7** Label
Parameter	Mandatory	Type	Description
attributes	No	Array of LabelAttribute objects	Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included.
name	No	String	Label name.
property	No	LabelProperty object	Basic attribute key-value pair of a label, such as color and shortcut keys.
type	No	Integer	Label type. Options: 0: image classification 1: object detection 3: image segmentation 100: text classification 101: named entity recognition 102: text triplet relationship 103: text triplet entity 200: sound classification 201: speech content 202: speech paragraph labeling 600: video labeling

**Table 8** LabelAttribute
Parameter	Mandatory	Type	Description
default_value	No	String	Default value of a label attribute.
id	No	String	Label attribute ID. You can query the tag by invoking the tag list.
name	No	String	Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'.
type	No	String	Label attribute type. Options: text: text select: single-choice drop-down list
values	No	Array of LabelAttributeValue objects	List of label attribute values.

**Table 9** LabelAttributeValue
Parameter	Mandatory	Type	Description
id	No	String	Label attribute value ID.
value	No	String	Label attribute value.

**Table 10** LabelProperty
Parameter	Mandatory	Type	Description
@modelarts:color	No	String	Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0.
@modelarts:default_shape	No	String	Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options: bndbox: rectangle polygon: polygon circle: circle line: straight line dashed: dotted line point: point polyline: polyline
@modelarts:from_type	No	String	Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.
@modelarts:rename_to	No	String	Default attribute: The new name of the label.
@modelarts:shortcut	No	String	Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D.
@modelarts:to_type	No	String	Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.

**Table 11** Field
Parameter	Mandatory	Type	Description
description	No	String	Schema description.
name	No	String	Schema name.
schema_id	No	Integer	Schema ID.
type	No	String	Schema value type.

**Table 12** WorkforceInformation
Parameter	Mandatory	Type	Description
data_sync_type	No	Integer	Synchronization type. Options: 0: not to be synchronized 1: data to be synchronized 2: label to be synchronized 3: data and label to be synchronized
repetition	No	Integer	Number of persons who label each sample. The minimum value is 1.
synchronize_auto_labeling_data	No	Boolean	Whether to synchronously update auto labeling data. Options: true: Update auto labeling data synchronously. false: Do not update auto labeling data synchronously.
synchronize_data	No	Boolean	Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options: true: Synchronize updated data to team members. false: Do not synchronize updated data to team members.
task_id	No	String	ID of a team labeling task.
task_name	Yes	String	Name of a team labeling task. The name contains 1 to 64 characters, including only letters, digits, underscores (_), and hyphens (-).
workforces_config	No	WorkforcesConfig object	Manpower assignment of a team labeling task. You can delegate the administrator to assign the manpower or do it by yourself.

**Table 13** WorkforcesConfig
Parameter	Mandatory	Type	Description
agency	No	String	Administrator
workforces	No	Array of WorkforceConfig objects	List of teams that execute labeling tasks.

**Table 14** WorkforceConfig
Parameter	Mandatory	Type	Description
workers	No	Array of Worker objects	List of labeling team members.
workforce_id	No	String	ID of a labeling team.
workforce_name	No	String	Name of a labeling team. The value contains 0 to 1024 characters and does not support the following special characters: !<>=&"'

**Table 15** Worker
Parameter	Mandatory	Type	Description
create_time	No	Long	Creation time.
description	No	String	Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"'
email	No	String	Email address of a labeling team member.
role	No	Integer	Role. Options: 0: labeling personnel 1: reviewer 2: team administrator 3: dataset owner
status	No	Integer	Current login status of a labeling team member. Options: 0: The invitation email has not been sent. 1: The invitation email has been sent but the user has not logged in. 2: The user has logged in. 3: The labeling team member has been deleted.
update_time	No	Long	Update time.
worker_id	No	String	ID of a labeling team member.
workforce_id	No	String	ID of a labeling team.

Response Parameters

Status code: 201

**Table 16** Response body parameters
Parameter	Type	Description
dataset_id	String	Dataset ID.
error_code	String	Error code.
error_msg	String	Error message.
import_task_id	String	ID of an import task.

Example Requests

Creating an Image Classification Dataset

{
  "workspace_id" : "0",
  "dataset_name" : "dataset-457f",
  "dataset_type" : 0,
  "data_sources" : [ {
    "data_type" : 0,
    "data_path" : "/test-obs/classify/input/animals/"
  } ],
  "description" : "",
  "work_path" : "/test-obs/classify/output/",
  "work_path_type" : 0,
  "labels" : [ {
    "name" : "Rabbits",
    "type" : 0,
    "property" : {
      "@modelarts:color" : "#3399ff"
    }
  }, {
    "name" : "Bees",
    "type" : 0,
    "property" : {
      "@modelarts:color" : "#3399ff"
    }
  } ]
}

Creating an Object Detection Dataset

{
  "workspace_id" : "0",
  "dataset_name" : "dataset-95a6",
  "dataset_type" : 1,
  "data_sources" : [ {
    "data_type" : 0,
    "data_path" : "/test-obs/detect/input/animals/"
  } ],
  "description" : "",
  "work_path" : "/test-obs/detect/output/",
  "work_path_type" : 0,
  "labels" : [ {
    "name" : "Rabbits",
    "type" : 1,
    "property" : {
      "@modelarts:color" : "#3399ff"
    }
  }, {
    "name" : "Bees",
    "type" : 1,
    "property" : {
      "@modelarts:color" : "#3399ff"
    }
  } ]
}

Creating a Table Dataset

{
  "workspace_id" : "0",
  "dataset_name" : "dataset-de83",
  "dataset_type" : 400,
  "data_sources" : [ {
    "data_type" : 0,
    "data_path" : "/test-obs/table/input/",
    "with_column_header" : true
  } ],
  "description" : "",
  "work_path" : "/test-obs/table/output/",
  "work_path_type" : 0,
  "schema" : [ {
    "schema_id" : 1,
    "name" : "150",
    "type" : "STRING"
  }, {
    "schema_id" : 2,
    "name" : "4",
    "type" : "STRING"
  }, {
    "schema_id" : 3,
    "name" : "setosa",
    "type" : "STRING"
  }, {
    "schema_id" : 4,
    "name" : "versicolor",
    "type" : "STRING"
  }, {
    "schema_id" : 5,
    "name" : "virginica",
    "type" : "STRING"
  } ],
  "import_data" : true
}

Example Responses

Status code: 201

Created

{
  "dataset_id" : "WxCREuCkBSAlQr9xrde"
}

Status Codes

Status Code	Description
201	Created
401	Unauthorized
403	Forbidden
404	Not Found

Error Codes

See Error Codes.

Parent topic: Data Management (Old Version)

Previous topic: Querying the Dataset List

Next topic: Querying Details About a Dataset

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot