Custom Data Processing Operators
In addition to the preset processing operators, ModelArts Studio allows you to create custom processing operators to meet specific data processing requirements and service scenarios. You can flexibly define the processing logic based on the site requirements to further improve the model training effect and adaptability.
Constraints
This function is available only to yearly/monthly subscribers.


Creating a Custom Processing Operator
To create a custom processing operator, perform the following steps:
- Log in to ModelArts Studio Large Model Deveopment Platform. In the My Spaces area, click the required workspace.
Figure 3 My Spaces
- In the navigation pane, choose Data Engineering > Data Processing > Processing Tasks. Click Manage Processing Operator in the upper right corner.
- On the Manage Processing Operators page, click the Custom tab, and click Create Custom Operator in the upper right corner.
- On the Create Custom Operator page, click Download samples to view the specifications of the operator configuration file and operator package. Use OBS to upload the operator configuration file and operator package, set the workspace visibility, and click OK in the lower right corner.
Figure 4 Creating a custom operator
- Custom operators that are created can be used in Processing Text Datasets, Processing Image Datasets, Processing Video Datasets, Processing Weather Datasets, and Processing Other Datasets.
Viewing operator details:
Click the operator name to view its details.
Operator Configuration File Specifications
Parameter Name |
Type |
Mandatory |
Description |
Constraints |
---|---|---|---|---|
id |
string |
Yes |
Operator name in English. |
The operator name must start with a letter and can contain up to 128 characters, including letters, digits, and underscores (_). If the length exceeds 128 characters, the excess part is automatically truncated. The ID cannot be changed when the operator is updated. The ID of a visible operator in all workspaces is unique in all workspaces, and the ID of a visible operator in the current workspace is unique in the current workspace. |
name |
string |
Yes |
Operator display name. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
description |
string |
No |
Operator description. |
The length cannot exceed 2000 characters. The excess part is automatically truncated. |
author |
string |
No |
Developer name. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
tags |
tags object |
Yes |
Operator label, which is used for classification and filtering. |
For details, see Table 2. |
labels |
Array of label objects |
No |
Label output by the labeling operator. |
For details, see Table 7. |
runtime |
runtime object |
Yes |
Operator running configuration. |
For details, see Table 3. |
arguments |
Array of argument objects |
No |
List of operator input parameters. |
For details, see Table 5. |
Parameter Name |
Type |
Mandatory |
Description |
Constraints |
---|---|---|---|---|
language |
Array of strings |
Yes |
List of languages that can be processed by the operator, for example, zh and en. |
The length of a single character string cannot exceed 32 characters. The excess part is automatically truncated. Enter an international language code. |
format |
Array of strings |
Yes |
List of dataset file name extensions supported by the operator, for example, JSON, CSV, and MP4. |
The length of a single character string cannot exceed 32 characters. The excess part is automatically truncated. |
category |
string |
Yes |
Operator type. |
Select only one option. The options are as follows: - Data extraction - Data sampling - Data conversion - Data filtering - Data deduplication - Data labeling - Other |
modal |
Array of strings |
Yes |
List of data modalities supported by the operator. |
Select one or multiple options. The options are as follows: - TEXT - IMAGE - VIDEO - AUDIO - OTHER (including weather and prediction) |
custom |
Array of strings |
No |
List of custom operator tags. |
The length of a single character string cannot exceed 32 characters. The excess part is automatically truncated. |
Parameter Name |
Type |
Mandatory |
Description |
Constraints |
---|---|---|---|---|
cpu-arch |
Array of strings |
Yes |
List of CPU architectures supported by the operator. |
Select one or multiple options. The options are as follows: - Arm - x86 |
xpu-devices |
Array of strings |
No |
List of device models supported by the operator. This parameter is mandatory when the operator runs on NPUs. |
The options are as follows: - SNT9B This parameter is mandatory when the number of NPUs of a resource in runtime.resources is greater than 0. The value can be SNT9B. |
environment |
string |
Yes |
Operator package type. Only pure Python operator packages are supported. |
Select only one option. The options are as follows: - PYTHON |
entrypoint |
string |
Yes |
Operator startup command. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. If environment is set to python, enter the fixed name process.py. The name cannot be changed. |
auto-data-loading |
boolean |
Yes |
Whether to automatically load data. |
If the value is true, the framework processes the input and output. If the value is false, the user processes the input and output. |
resources |
Array of resource objects |
No |
List of resource sizes required for running a single operator instance. |
This parameter is mandatory when environment is set to python. |
Parameter Name |
Type |
Mandatory |
Description |
Constraints |
---|---|---|---|---|
cpu |
integer |
Yes |
Default CPU resources of a single instance (unit: vCPU). |
You need to configure it based on the actual number of general computing units. |
memory |
integer |
Yes |
Default memory size of a single instance, in MB. |
You need to configure it based on the actual number of general computing units. |
npu |
integer |
No |
Default number of NPUs of a single instance, in card. |
You need to set this parameter based on the specifications and quantity of the subscribed intelligent computing units. |
Parameter Name |
Type |
Mandatory |
Description |
Constraints |
---|---|---|---|---|
key |
string |
Yes |
Parameter ID, which must be unique in the parameter list. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
name |
string |
Yes |
Parameter display name. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
type |
string |
Yes |
Parameter type. |
Select only one option. The options are as follows: - STRING - FLOAT - INT - ENUM (radio button) - LIST (check box) - OBS (front-end component parameter for selecting an OBS file) - BOOLEAN |
tips |
string |
No |
Parameter description. |
The length cannot exceed 2000 characters. The excess part is automatically truncated. |
min |
float |
No |
Minimum value of the parameter. This parameter is optional when type is set to INT or FLOAT. |
A maximum of four decimal places are reserved for floating-point numbers. |
max |
float |
No |
Minimum value of the parameter. This parameter is optional when type is set to INT or FLOAT. |
A maximum of four decimal places are reserved for floating-point numbers. |
between |
boolean |
No |
Whether the parameter is a range-type numeric parameter. This parameter is optional when type is set to INT or FLOAT. The default value is false. |
- |
items |
Array of item objects |
No |
List of enumerated values. This parameter is mandatory when type is set to ENUM or LIST. |
This parameter is mandatory when type is set to ENUM or LIST. The number of enumerated items must be at least 1. |
required |
boolean |
No |
Whether the parameter is mandatory. |
Set it to true or false. |
visible |
boolean |
No |
Whether the parameter is visible to the frontend. |
Set it to true or false. |
default |
string |
No |
Default parameter value. |
If visible is set to false and requires is set to true, the default value must be set. If there are multiple default values, separate them with commas (,), for example, SD,HD. The default value of a numeric parameter is in the format of min;max. |
Parameter Name |
Type |
Mandatory |
Description |
Constraints |
---|---|---|---|---|
name |
string |
Yes |
Enumerated item value. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
value |
string |
Yes |
Value of the enumerated item. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
tips |
string |
No |
Description or description of the enumerated item. |
The length cannot exceed 2000 characters. The excess part is automatically truncated. |
Parameter Name |
Type |
Mandatory |
Description |
Constraints |
---|---|---|---|---|
key |
string |
Yes |
Key of a tag. |
The key can contain a maximum of 128 characters. If the key exceeds 128 characters, the excess part is automatically truncated. The key must be unique among operators. |
name |
string |
Yes |
Tag name. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
type |
string |
Yes |
Tag type. |
Enumerated value. The options are STRING, NUMERIC, ENUM, and OBJECT. Only one option can be selected. |
min |
float |
No |
Minimum tag value. |
This parameter is mandatory when type is set to NUMERIC. |
max |
float |
No |
Maximum tag value. |
This parameter is mandatory when type is set to NUMERIC. |
items |
Array of labelItem objects |
No |
Tag enumeration list. |
This parameter is mandatory when type is set to ENUM. |
dimensions |
Array of labelDimension objects |
No |
Level-1 tag dimension. |
This parameter is mandatory when type is set to OBJECT. |
Parameter Name |
Type |
Mandatory |
Description |
Constraints |
---|---|---|---|---|
name |
string |
Yes |
Enumerated item name. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
value |
string |
Yes |
Enumerated value. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
Parameter Name |
Type |
Mandatory |
Description |
Constraints |
---|---|---|---|---|
key |
string |
Yes |
Key of the level-2 tag. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
name |
string |
Yes |
Level-2 tag name. |
The value contains a maximum of 128 characters. The excess part is automatically truncated. |
type |
string |
Yes |
Level-2 tag type. |
Enumerated value. The options are STRING, NUMERIC, and ENUM. Only one option can be selected. |
min |
float |
No |
Minimum value of the level-2 tag. |
This parameter is mandatory when type is set to NUMERIC. |
max |
float |
No |
Maximum value of the level-2 tag. |
This parameter is mandatory when type is set to NUMERIC. |
items |
Array of labelItem objects |
No |
Enumeration list of level-2 tags. |
This parameter is mandatory when type is set to ENUM. |
An operator configuration file is a YAML file that describes the basic information, function parameters, operating environment, and resource requirements of an operator. Configure the configuration file as follows:
id: video_clip # (Mandatory) Operator abbreviation. The value must start with a letter and can contain a maximum of 128 characters, including letters, digits, and underscores (_). If the value contains more than 128 characters, the excess characters will be truncated. name: Video clip # (Mandatory) Operator display name. The value can contain a maximum of 128 characters. If the value contains more than 128 characters, the excess characters will be truncated. description: Splits a long video into multiple clips. # (Mandatory) Operator description. The value can contain a maximum of 2,000 characters. author: "xxx Technology Co., Ltd." # (Optional) Developer or team name, for example, Data Team. The value can contain a maximum of 128 characters. tags: # (Mandatory) Operator tag, which is used for classification and filtering in specific scenarios. language: # (Mandatory) Language that can be processed by the operator. The value is a language code. Multiple options can be selected. Only international standard language codes are supported. The value contains a maximum of 32 characters. - zh - en -... format: # (Mandatory) Dataset file format supported by the operator. Multiple options are supported. The value contains a maximum of 32 characters. - JSONL - TXT - CSV - HTML - MOBI - EPUB - DOCX - PDF - MP4 - AVI -... category: # (Mandatory) Operator type, which is used to display operators by category on the GUI. Only one option can be selected. The options are as follows: - Data extraction - Data sampling - Data conversion - Data filtering - Data Deduplication - Data labeling - Other modal: # (Mandatory) Data modality supported by the operator. Multiple options can be selected. The options are as follows: - TEXT - IMAGE - VIDEO - AUDIO - OTHER custom: # Tag of a custom operator. Multiple options can be selected. The value contains a maximum of 32 characters. - Data augmentation - Pre-labeling -... runtime: cpu-arch: # (Mandatory) Supported CPUs - ARM - X86 xpu-devices: # XPU model supported by the operator. Multiple options can be selected. The options are as follows (mandatory if NPU is involved): - SNT9B resources: # Default resources of a single instance. This parameter is mandatory when the operator package type is python. - cpu: 16 # CPU processor type memory: 256 - cpu: 8 # NPU processor type - memory: 1024 # Unit: MB npu: 1 environment: python # (Mandatory) Operator package type. The value can be python (pure Python operator package). entrypoint: process.py # (Mandatory) Fixed file name process.py when environment is set to python. The value cannot be changed. #All service parameters are input parameters of the operator. The data types of the parameters include STRING, FLOAT, INT, ENUM (radio button), LIST (check box), and OBS. arguments: # Example of parameters of the STRING type - key: filter_keywords # [Mandatory] name: Filter keyword. type: STRING tips: The samples that match the keywords will be filtered. Multiple keywords are separated by commas (,). #Tips on the GUI required: true visible: true default: gambling # Default value of the parameter. This parameter is optional. Use commas (,) to separate multiple default values. # Example of the INT/FLOAT value range type - key: length_of_characters # [Mandatory] name: filtering duration range type: FLOAT # [Mandatory] When type is set to int or float, the between field is mandatory. between: true # Whether to obtain the value range. When type is set to int or float, the between field is mandatory. min: 1.0 # Minimum value of the parameter, which is optional. When type is set to float, the value can contain a maximum of four decimal places. If the value exceeds the maximum, it will be truncated. max: 500.0 # Range of the maximum parameter value. This field is optional. tips: The filtering duration range includes the entered boundary value. The value is of the float type, in characters. Samples whose text length is within the specified range are retained. #GUI tips required: true visible: true default: 100.0;300.0 # (Optional) Default maximum and minimum parameter values. If this parameter is not set, min;max is used as the default value. # Example of parameters of the INT/FLOAT numeric type - key: max_cropping_area_ratio name: Maximum cropping area ratio type: FLOAT between: false # Whether the parameter is of the range type. The default value is false. min: 0.0 # Range of the minimum value of the parameter max: 100.0 # Range of the maximum value of the parameter tips: The value is a float ranging from 0.0 to 100.0, in percentage (%). Samples whose cropped area ratio (cropped video area/original video area) is greater than the value will be filtered out. visible: true required: true default: 100 # Default value # Example of an ENUM parameter - key: font_conversion name: Text font conversion type: ENUM # Only one option can be selected. items: - name: Simplified Chinese to Traditional Chinese value: traditional - name: Convert Traditional Chinese to Simplified Chinese value: simplified required: true visible: true default: simplified # Example of a LIST parameter - key: resolution name: Resolution type: LIST # Multiple options can be selected. items: # Parameter options. This parameter is mandatory when type is set to ENUM or LIST. - name: Smoothness #[Mandatory] value: SM #[Mandatory] tips: 480 > resolution ≥ 360 - name: SD value: SD tips: 720 > resolution ≥ 480 - name: HD value: HD tips: 1080 > resolution ≥ 720 required: true visible: true default: SD,HD # Use commas (,) to separate multiple default values. # Example of an OBS parameter - key: sensitive_word name: OBS path of the sensitive word dictionary file type: OBS tips: sensitive word dictionary file required: true visible: true default: NLP/system_resource/sensitive_word.csv # OBS path of the default word dictionary # Example of a BOOLEAN parameter - key: parse_all name: Whether to parse all files type: BOOLEAN items: - name: Yes value: true - name: No value: false visible: true required: true default: false
Operator Package Specifications
Python operator package
+--- video_clip # The directory name must be the same as the tar package name. | +--- program_package # Python operator directory | | +--- install.sh # (Optional) Installation script | | +--- process.py # (Mandatory) Operator code
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot