Creating a Dataset
Function
This API is used to create a dataset.
Debugging
You can debug this API through automatic authentication in or use the SDK sample code generated by API Explorer.
URI
POST /v2/{project_id}/datasets
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        project_id  | 
      
        Yes  | 
      
        String  | 
      
        Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.  | 
     
Request Parameters
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        data_format  | 
      
        No  | 
      
        String  | 
      
        Data format. Options:  | 
     
| 
        data_sources  | 
      
        Yes  | 
      
        Array of DataSource objects  | 
      
        Input dataset path, which is used to synchronize source data (such as images, text files, and audio files) in the directory and its subdirectories to the dataset. For a table dataset, this parameter indicates the import directory. The work directory of a table dataset cannot be an OBS path in a KMS-encrypted bucket. Only one data source can be imported at a time.  | 
     
| 
        dataset_name  | 
      
        Yes  | 
      
        String  | 
      
        Dataset name. The value contains 1 to 100 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed, for example, dataset-9f3b.  | 
     
| 
        dataset_type  | 
      
        No  | 
      
        Integer  | 
      
        Dataset type. Options:  | 
     
| 
        description  | 
      
        No  | 
      
        String  | 
      
        Dataset description. The value is empty by default. The description contains 0 to 256 characters and does not support the following special characters: ^!<>=&"'  | 
     
| 
        import_annotations  | 
      
        No  | 
      
        Boolean  | 
      
        Indicates whether to automatically import the labeling information in the input directory. Object detection, image classification, and text classification are supported. The options are as follows:  | 
     
| 
        import_data  | 
      
        No  | 
      
        Boolean  | 
      
        Whether to import data. This parameter is used only for table datasets. Options:  | 
     
| 
        label_format  | 
      
        No  | 
      
        LabelFormat object  | 
      
        Label format information. This parameter is used only for text datasets.  | 
     
| 
        labels  | 
      
        No  | 
      
        Array of Label objects  | 
      
        Dataset label list.  | 
     
| 
        managed  | 
      
        No  | 
      
        Boolean  | 
      
        Whether to host a dataset. Options:  | 
     
| 
        schema  | 
      
        No  | 
      
        Array of Field objects  | 
      
        Schema list.  | 
     
| 
        work_path  | 
      
        Yes  | 
      
        String  | 
      
        Output dataset path, which is used to store output files such as label files.  | 
     
| 
        work_path_type  | 
      
        Yes  | 
      
        Integer  | 
      
        Type of the dataset output path. The default value is 0, indicating an OBS bucket.  | 
     
| 
        workforce_information  | 
      
        No  | 
      
        WorkforceInformation object  | 
      
        Team labeling information.  | 
     
| 
        workspace_id  | 
      
        No  | 
      
        String  | 
      
        Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value.  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        data_path  | 
      
        No  | 
      
        String  | 
      
        Data source path.  | 
     
| 
        data_type  | 
      
        No  | 
      
        Integer  | 
      
        Data type. Options:  | 
     
| 
        schema_maps  | 
      
        No  | 
      
        Array of SchemaMap objects  | 
      
        Schema mapping information corresponding to the table data.  | 
     
| 
        source_info  | 
      
        No  | 
      
        SourceInfo object  | 
      
        Information required for importing a table data source.  | 
     
| 
        with_column_header  | 
      
        No  | 
      
        Boolean  | 
      
        Whether the first row in the file is a column name. This field is valid for the table dataset. Options:  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        dest_name  | 
      
        No  | 
      
        String  | 
      
        Name of the destination column.  | 
     
| 
        src_name  | 
      
        No  | 
      
        String  | 
      
        Name of the source column.  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        cluster_id  | 
      
        No  | 
      
        String  | 
      
        MRS cluster ID. You can log in to the MRS console to view the information.  | 
     
| 
        cluster_mode  | 
      
        No  | 
      
        String  | 
      
        Running mode of an MRS cluster. Options:  | 
     
| 
        cluster_name  | 
      
        No  | 
      
        String  | 
      
        MRS cluster name You can log in to the MRS console to view the information.  | 
     
| 
        database_name  | 
      
        No  | 
      
        String  | 
      
        Name of the database to which the table dataset is imported.  | 
     
| 
        input  | 
      
        No  | 
      
        String  | 
      
        HDFS path of the table data set. For example, /datasets/demo.  | 
     
| 
        ip  | 
      
        No  | 
      
        String  | 
      
        IP address of your GaussDB(DWS) cluster.  | 
     
| 
        port  | 
      
        No  | 
      
        String  | 
      
        Port number of your GaussDB(DWS) cluster.  | 
     
| 
        queue_name  | 
      
        No  | 
      
        String  | 
      
        DLI queue name of a table dataset.  | 
     
| 
        subnet_id  | 
      
        No  | 
      
        String  | 
      
        Subnet ID of an MRS cluster.  | 
     
| 
        table_name  | 
      
        No  | 
      
        String  | 
      
        Name of the table to which a table dataset is imported.  | 
     
| 
        user_name  | 
      
        No  | 
      
        String  | 
      
        Username, which is mandatory for GaussDB(DWS) data.  | 
     
| 
        user_password  | 
      
        No  | 
      
        String  | 
      
        User password, which is mandatory for GaussDB(DWS) data.  | 
     
| 
        vpc_id  | 
      
        No  | 
      
        String  | 
      
        ID of the VPC where an MRS cluster resides.  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        attributes  | 
      
        No  | 
      
        Array of LabelAttribute objects  | 
      
        Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included.  | 
     
| 
        name  | 
      
        No  | 
      
        String  | 
      
        Label name.  | 
     
| 
        property  | 
      
        No  | 
      
        LabelProperty object  | 
      
        Basic attribute key-value pair of a label, such as color and shortcut keys.  | 
     
| 
        type  | 
      
        No  | 
      
        Integer  | 
      
        Label type. Options:  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        default_value  | 
      
        No  | 
      
        String  | 
      
        Default value of a label attribute.  | 
     
| 
        id  | 
      
        No  | 
      
        String  | 
      
        Label attribute ID. You can query the tag by invoking the tag list.  | 
     
| 
        name  | 
      
        No  | 
      
        String  | 
      
        Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'.  | 
     
| 
        type  | 
      
        No  | 
      
        String  | 
      
        Label attribute type. Options:  | 
     
| 
        values  | 
      
        No  | 
      
        Array of LabelAttributeValue objects  | 
      
        List of label attribute values.  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        id  | 
      
        No  | 
      
        String  | 
      
        Label attribute value ID.  | 
     
| 
        value  | 
      
        No  | 
      
        String  | 
      
        Label attribute value.  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        @modelarts:color  | 
      
        No  | 
      
        String  | 
      
        Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0.  | 
     
| 
        @modelarts:default_shape  | 
      
        No  | 
      
        String  | 
      
        Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options:  | 
     
| 
        @modelarts:from_type  | 
      
        No  | 
      
        String  | 
      
        Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.  | 
     
| 
        @modelarts:rename_to  | 
      
        No  | 
      
        String  | 
      
        Default attribute: The new name of the label.  | 
     
| 
        @modelarts:shortcut  | 
      
        No  | 
      
        String  | 
      
        Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D.  | 
     
| 
        @modelarts:to_type  | 
      
        No  | 
      
        String  | 
      
        Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        description  | 
      
        No  | 
      
        String  | 
      
        Schema description.  | 
     
| 
        name  | 
      
        No  | 
      
        String  | 
      
        Schema name.  | 
     
| 
        schema_id  | 
      
        No  | 
      
        Integer  | 
      
        Schema ID.  | 
     
| 
        type  | 
      
        No  | 
      
        String  | 
      
        Schema value type.  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        data_sync_type  | 
      
        No  | 
      
        Integer  | 
      
        Synchronization type. Options:  | 
     
| 
        repetition  | 
      
        No  | 
      
        Integer  | 
      
        Number of persons who label each sample. The minimum value is 1.  | 
     
| 
        synchronize_auto_labeling_data  | 
      
        No  | 
      
        Boolean  | 
      
        Whether to synchronously update auto labeling data. Options:  | 
     
| 
        synchronize_data  | 
      
        No  | 
      
        Boolean  | 
      
        Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options:  | 
     
| 
        task_id  | 
      
        No  | 
      
        String  | 
      
        ID of a team labeling task.  | 
     
| 
        task_name  | 
      
        Yes  | 
      
        String  | 
      
        Name of a team labeling task. The name contains 1 to 64 characters, including only letters, digits, underscores (_), and hyphens (-).  | 
     
| 
        workforces_config  | 
      
        No  | 
      
        WorkforcesConfig object  | 
      
        Manpower assignment of a team labeling task. You can delegate the administrator to assign the manpower or do it by yourself.  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        agency  | 
      
        No  | 
      
        String  | 
      
        Administrator  | 
     
| 
        workforces  | 
      
        No  | 
      
        Array of WorkforceConfig objects  | 
      
        List of teams that execute labeling tasks.  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        workers  | 
      
        No  | 
      
        Array of Worker objects  | 
      
        List of labeling team members.  | 
     
| 
        workforce_id  | 
      
        No  | 
      
        String  | 
      
        ID of a labeling team.  | 
     
| 
        workforce_name  | 
      
        No  | 
      
        String  | 
      
        Name of a labeling team. The value contains 0 to 1024 characters and does not support the following special characters: !<>=&"'  | 
     
| 
        Parameter  | 
      
        Mandatory  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|---|
| 
        create_time  | 
      
        No  | 
      
        Long  | 
      
        Creation time.  | 
     
| 
        description  | 
      
        No  | 
      
        String  | 
      
        Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"'  | 
     
| 
        | 
      
        No  | 
      
        String  | 
      
        Email address of a labeling team member.  | 
     
| 
        role  | 
      
        No  | 
      
        Integer  | 
      
        Role. Options:  | 
     
| 
        status  | 
      
        No  | 
      
        Integer  | 
      
        Current login status of a labeling team member. Options:  | 
     
| 
        update_time  | 
      
        No  | 
      
        Long  | 
      
        Update time.  | 
     
| 
        worker_id  | 
      
        No  | 
      
        String  | 
      
        ID of a labeling team member.  | 
     
| 
        workforce_id  | 
      
        No  | 
      
        String  | 
      
        ID of a labeling team.  | 
     
Response Parameters
Status code: 201
| 
        Parameter  | 
      
        Type  | 
      
        Description  | 
     
|---|---|---|
| 
        dataset_id  | 
      
        String  | 
      
        Dataset ID.  | 
     
| 
        error_code  | 
      
        String  | 
      
        Error code.  | 
     
| 
        error_msg  | 
      
        String  | 
      
        Error message.  | 
     
| 
        import_task_id  | 
      
        String  | 
      
        ID of an import task.  | 
     
Example Requests
- 
    
Creating an Image Classification Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-457f", "dataset_type" : 0, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/classify/input/animals/" } ], "description" : "", "work_path" : "/test-obs/classify/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } } ] } - 
    
Creating an Object Detection Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-95a6", "dataset_type" : 1, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/detect/input/animals/" } ], "description" : "", "work_path" : "/test-obs/detect/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } } ] } - 
    
    
{ "workspace_id" : "0", "dataset_name" : "dataset-de83", "dataset_type" : 400, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/table/input/", "with_column_header" : true } ], "description" : "", "work_path" : "/test-obs/table/output/", "work_path_type" : 0, "schema" : [ { "schema_id" : 1, "name" : "150", "type" : "STRING" }, { "schema_id" : 2, "name" : "4", "type" : "STRING" }, { "schema_id" : 3, "name" : "setosa", "type" : "STRING" }, { "schema_id" : 4, "name" : "versicolor", "type" : "STRING" }, { "schema_id" : 5, "name" : "virginica", "type" : "STRING" } ], "import_data" : true } 
Example Responses
Status code: 201
Created
{
  "dataset_id" : "WxCREuCkBSAlQr9xrde"
}
 Status Codes
| 
        Status Code  | 
      
        Description  | 
     
|---|---|
| 
        201  | 
      
        Created  | 
     
| 
        401  | 
      
        Unauthorized  | 
     
| 
        403  | 
      
        Forbidden  | 
     
| 
        404  | 
      
        Not Found  | 
     
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.