Updated on 2024-08-14 GMT+08:00

Parameter Overview

You can use ReleaseDatasetStep to create a dataset release phase. The following is an example of defining a ReleaseDatasetStep.

Table 1 ReleaseDatasetStep

Parameter

Description

Mandatory

Data Type

name

Name of a dataset release phase. The name contains a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-). It must start with a letter and must be unique in a workflow.

Yes

str

inputs

Inputs of the dataset release phase

Yes

ReleaseDatasetInput or ReleaseDatasetInput list

outputs

Outputs of the dataset release phase

Yes

ReleaseDatasetOutput or ReleaseDatasetOutput list

title

Title for frontend display

No

str

description

Description of the dataset release phase

No

str

policy

Phase execution policy

No

StepPolicy

depend_steps

Dependency phases

No

Step or step list

Table 2 ReleaseDatasetInput

Parameter

Description

Mandatory

Data Type

name

Input name of the dataset release phase. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter. The input name of a step must be unique.

Yes

str

data

Input data object of the dataset release phase

Yes

Dataset or labeling job object. Currently, only Dataset, DatasetConsumption, DatasetPlaceholder, LabelTask, LabelTaskPlaceholder, LabelTaskConsumption, and DataConsumptionSelector are supported.

Table 3 ReleaseDatasetOutput

Parameter

Description

Mandatory

Data Type

name

Output name of the dataset release phase. The name can contain a maximum of 64 characters, including only letters, digits, underscores (_), and hyphens (-), and must start with a letter. The output name of a step must be unique.

Yes

str

dataset_version_config

Configurations for dataset version release

Yes

DatasetVersionConfig

Table 4 DatasetVersionConfig

Parameter

Description

Mandatory

Data Type

version_name

Dataset version name. By default, the dataset version is named in ascending order of V001 and V002.

No

str or Placeholder

version_format

Version format, which defaults to Default. You can also set it to CarbonData.

No

str

train_evaluate_sample_ratio

Ratio between the training set and validation set, which defaults to 1.00. The value ranges from 0 to 1.00. For example, 0.8 indicates the ratio for the training set is 80%, and that for the validation set is 20%.

No

str or Placeholder

clear_hard_property

Whether to clear hard examples. The default value is True.

No

bool or Placeholder

remove_sample_usage

Whether to clear existing usage information of a dataset. The default value is True.

No

bool or Placeholder

label_task_type

Type of a labeling job. If the input is a dataset, this field is mandatory and is used to specify the labeling scenario of the dataset version. If the input is a labeling job, this field does not need to be configured.

No

LabelTaskTypeEnum

The following types are supported:

  • IMAGE_CLASSIFICATION
  • OBJECT_DETECTION = 1
  • IMAGE_SEGMENTATION
  • TEXT_CLASSIFICATION
  • NAMED_ENTITY_RECOGNITION
  • TEXT_TRIPLE
  • AUDIO_CLASSIFICATION
  • SPEECH_CONTENT

    SPEECH_SEGMENTATION

  • TABLE
  • VIDEO_ANNOTATION

description

Description of a version

No

str

If there is no special requirement, use the default values.