Description
Data labeling is a key step in data engineering. It aims to add accurate labels to unlabeled datasets to provide effective supervision signals for model training.
The quality of labeled data directly affects the training effect and accuracy of the model. Therefore, an efficient and accurate labeling process is critical.
The data labeling function allows you to create annotation tasks, label datasets (annotation tasks), review labeled datasets (review tasks), and manage annotation tasks (task management). The functions supported by different roles and the displayed frontend pages are slightly different. For details, see Table 1.
Table 1 Data labeling task permissions supported by different roles
|
Role |
Labeling Task Creation |
Data Labeling |
Labeling Review |
Labeling Task Management |
|
Super Admin |
√ |
√ |
- |
√ |
|
Administrator |
√ |
√ |
- |
√ |
|
Annotation administrator |
√ |
√ |
- |
√ |
|
Annotation operator |
- |
√ |
- |
- |
|
Annotation auditor |
- |
- |
√ |
- |
Currently, text, video, and image datasets can be labeled.
Creating a Text Labeling Task
- On the Create annotation task page, select the text dataset to be annotated and select annotation items. The annotation items vary depending on the data file type. You can select annotation items as prompted.
The single-turn Q&A annotation items support the AI-assisted annotation function. If this function is enabled, you need to select a deployed NLP service as the AI-assisted annotation model.
- You can enable the multiplayer job function. After this function is enabled, you can select multiple persons to complete the operation. In addition, the review function can be enabled if required. Configure labeling allocation and review by referring to Table 2.
Table 2 Labeling allocation and review configuration
|
Type |
Parameter |
Description |
|
Labeling assignment |
Annotator |
Add annotators and the number of annotations. |
|
Labeling review |
Is reviewed |
- No: The review operation is not performed after labeling.
- Yes: The reviewer checks the annotation content of the annotator. If any problem is found, the reviewer can specify the reason and reject the annotation data. The annotator needs to label the data again.
|
|
Reviewer |
Add reviewers and the number of reviewers. |
|
Review Requirement |
- Full review: The reviewer needs to manually review all data records one by one.
- Partial review: If the labeling quality of some data is high, the reviewer can submit the remaining data for review in one-click mode. By default, the data is approved and the review task is complete.
|
- After the configuration is complete, click Complete creation.
Creating a Video Labeling Task
- On the Create annotation task page, select the video dataset to be annotated and select annotation items. The annotation items vary depending on the data file type. You can select annotation items as prompted.
If you select Video Caption, you can enable the AI pre-annotation function. AI pre-annotation automatically generates labeling content and does not overwrite the original dataset. The annotation content can be used as a reference for annotation personnel to improve annotation efficiency.
- You can enable the multiplayer job function. After this function is enabled, you can select multiple persons to complete the operation. In addition, the review function can be enabled if required. Configure labeling allocation and review by referring to Table 3.
Table 3 Labeling allocation and review configuration
|
Type |
Parameter |
Description |
|
Labeling assignment |
Annotator |
Add annotators and the number of annotations. |
|
Annotation requirements |
If you select Image Caption and enable AI pre-annotation, you can set the annotation requirements in either of the following ways:
- Full annotation: The labeling personnel need to manually label all data before submitting the annotations.
- Partial annotation: After confirming that the AI pre-annotation meets the requirements, you can directly use the AI pre-annotation function to label the dataset and submit the annotations.
|
|
Labeling review |
Is reviewed |
- No: The review operation is not performed after labeling.
- Yes: The reviewer checks the annotation content of the annotator. If any problem is found, the reviewer can specify the reason and reject the annotation data. The annotator needs to label the data again.
|
|
Reviewer |
Add reviewers and the number of reviewers. |
|
Review Requirement |
- Full review: The reviewer needs to manually review all data records one by one.
- Partial review: If the labeling quality of some data is high, the reviewer can submit the remaining data for review in one-click mode. By default, the data is approved and the review task is complete.
|
- After the configuration is complete, click Complete creation.
Creating an Image Labeling Task
- On the Create annotation task page, select the image dataset to be annotated and select annotation items. The annotation items vary depending on the data file type. You can select annotation items as prompted.
If you select Image Caption or Object Detection, you can enable the AI pre-annotation function. AI pre-annotation automatically generates labeling content and does not overwrite the original dataset. The annotation content can be used as a reference for annotation personnel to improve annotation efficiency.
- You can enable the multiplayer job function. After this function is enabled, you can select multiple persons to complete the operation. In addition, the review function can be enabled if required. Configure labeling allocation and review by referring to Table 4.
Table 4 Labeling allocation and review configuration
|
Type |
Parameter |
Description |
|
Labeling assignment |
Annotator |
Add annotators and the number of annotations. |
|
Annotation requirements |
If you select Image Caption and enable AI pre-annotation, you can set the annotation requirements in either of the following ways:
- Full annotation: The labeling personnel need to manually label all data before submitting the annotations.
- Partial annotation: After confirming that the AI pre-annotation meets the requirements, you can directly use the AI pre-annotation function to label the dataset and submit the annotations.
|
|
Labeling review |
Is reviewed |
- No: The review operation is not performed after labeling.
- Yes: The reviewer checks the annotation content of the annotator. If any problem is found, the reviewer can specify the reason and reject the annotation data. The annotator needs to label the data again.
|
|
Reviewer |
Add reviewers and the number of reviewers. |
|
Review Requirement |
- Full review: The reviewer needs to manually review all data records one by one.
- Partial review: If the labeling quality of some data is high, the reviewer can submit the remaining data for review in one-click mode. By default, the data is approved and the review task is complete.
|
- After the configuration is complete, click Complete creation.