Updated on 2024-08-14 GMT+08:00

Parameter Overview

You can use CreateDatasetStep to create a dataset creation phase. The following is an example of defining a CreateDatasetStep.

Table 1 CreateDatasetStep

Parameter

Description

Mandatory

Data Type

name

Name of a dataset creation phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.

Yes

str

inputs

Inputs of the dataset creation phase.

Yes

CreateDatasetInput or a list of CreateDatasetInput

outputs

Outputs of the dataset creation phase.

Yes

CreateDatasetOutput or a list of CreateDatasetOutput

properties

Configurations for dataset creation.

Yes

DatasetProperties

title

Title for frontend display.

No

str

description

Description of the dataset creation phase.

No

str

policy

Phase execution policy.

No

StepPolicy

depend_steps

Dependency phases.

No

Step or step list

Table 2 CreateDatasetInput

Parameter

Description

Mandatory

Data Type

name

Input name of the dataset creation phase. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter. The input name of a step must be unique.

Yes

str

data

Input data object of the dataset creation phase.

Yes

OBS object. Currently, only OBSPath, OBSConsumption, OBSPlaceholder, and DataConsumptionSelector are supported.

Table 3 CreateDatasetOutput

Parameter

Description

Mandatory

Data Type

name

Output name of the dataset creation phase. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter. The output name of a step must be unique.

Yes

str

config

Output configurations of the dataset creation phase.

Yes

Currently, only OBSOutputConfig is supported.

Table 4 DatasetProperties

Parameter

Description

Mandatory

Data Type

dataset_name

Dataset name. The value contains 1 to 100 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed.

Yes

str, Placeholder

dataset_format

Dataset format. The default value is 0, indicating the file type.

No

0: file

1: table

data_type

Data type. The default value is FREE_FORMAT.

No

DataTypeEnum

description

Description.

No

str

import_data

Whether to import data. The default value is False. Currently, only table data is supported.

No

bool

work_path_type

Type of the dataset output path. Currently, only OBS is supported. The default value is 0.

No

int

import_config

Configurations for label import. The default value is None. When creating a dataset based on labeled data, you can specify this parameter to import labeling information.

No

ImportConfig

Table 5 Importconfig

Parameter

Description

Mandatory

Data Type

import_annotations

Whether to automatically import the labeling information in the input directory, supporting detection, image classification, and text classification. Options:

  • true: The labeling information in the input directory is imported. (Default)
  • false: The labeling information in the input directory is not imported.

No

str, Placeholder

import_type

Import mode. Options:

  • dir: imported from an OBS path
  • manifest: imported from a manifest file

No

0: file type ImportTypeEnum

annotation_format_config

Configurations of the imported labeling format

No

DAnnotationFormaTypeEtConumfig list

Table 6 AnnotationFormatConfig

Parameter

Description

Mandatory

Data Type

format_name

Name of a labeling format

No

AnnotationFormatEnum

scene

Labeling scenario, which is optional

No

LabelTaskTypeEnum

Enumerated Type

Enumerated Value

ImportTypeEnum

DIR

MANIFEST

DataTypeEnum

IMAGE

TEXT

AUDIO

TABULAR

VIDEO

FREE_FORMAT

AnnotationFormatEnum

MA_IMAGE_CLASSIFICATION_V1

MA_IMAGENET_V1

MA_PASCAL_VOC_V1

YOLO

MA_IMAGE_SEGMENTATION_V1

MA_TEXT_CLASSIFICATION_COMBINE_V1

MA_TEXT_CLASSIFICATION_V1

MA_AUDIO_CLASSIFICATION_DIR_V1