Format Requirements for Image Datasets
ModelArts Studio supports the creation of image datasets. During the creation, you can import data in various formats. Table 1 lists the format requirements.
File Content |
File Format |
File Requirements |
---|---|---|
Image only |
TAR and image directory |
|
Image + Caption |
Image: TAR; Caption: JSONL |
|
Image + QA Pair |
Image: TAR; QA pair: JSONL |
|
Object detection |
PASCAL VOC |
|
Image classification |
Image + TXT |
|
Instance segmentation |
Image + XML |
|
Specifications of Annotation Files in an Object Detection Dataset
The following description follows the annotation file format for object detection in Table 1.
The object detection dataset supports annotation files in ModelArts PASCAL VOC 1.0 format.
Labeled objects and their annotation files (in one-to-one relationship with the labeled objects) must be in the same directory. For example, if the name of the labeled object file is IMG_20180919_114745.jpg, the name of the annotation file must be IMG_20180919_114745.xml.
The annotation files must be in PASCAL VOC format, a standardized XML annotation format used for labeling image datasets. A PASCAL_VOC file contains information on the image directory, image file code, image size, and object information. For details about the format, see Table 2.
Example of a file uploaded to OBS:
├─dataset-import-example │ IMG_20180919_114732.jpg │ IMG_20180919_114732.xml │ IMG_20180919_114745.jpg │ IMG_20180919_114745.xml │ IMG_20180919_114945.jpg │ IMG_20180919_114945.xml
An XML annotation file example is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <annotation> <folder>NA</folder> <filename>bike_1_1593531469339.png</filename> <source> <database>Unknown</database> </source> <size> <width>554</width> <height>606</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>Dog</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <occluded>0</occluded> <bndbox> <xmin>279</xmin> <ymin>52</ymin> <xmax>474</xmax> <ymax>278</ymax> </bndbox> </object> <object> <name>Cat</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <occluded>0</occluded> <bndbox> <xmin>279</xmin> <ymin>198</ymin> <xmax>456</xmax> <ymax>421</ymax> </bndbox> </object> </annotation>
Field |
Mandatory (Yes/No) |
Description |
---|---|---|
folder |
Yes |
Name of the directory where the image is located |
filename |
Yes |
Name of the labeled file |
size |
Yes |
Image pixel
|
segmented |
Yes |
Segmented or not. The value can be 0 or 1. The value 0 means no segmentation, and 1 means segmentation. |
object |
Yes |
Target object information, which includes the category, pose, truncation status, identification difficulty, and bounding box of an object. An image may contain more than one object.
|
type |
Shape |
Labeling Information |
---|---|---|
point |
Point |
Coordinates of a point <x>100<x> <y>100<y> |
line |
Line |
Coordinates of points <x1>100<x1> <y1>100<y1> <x2>200<x2> <y2>200<y2> |
bndbox |
Rectangle |
Coordinates of the upper left and lower right points <xmin>100<xmin> <ymin>100<ymin> <xmax>200<xmax> <ymax>200<ymax> |
polygon |
Polygon |
Coordinates of points <x1>100<x1> <y1>100<y1> <x2>200<x2> <y2>100<y2> <x3>250<x3> <y3>150<y3> <x4>200<x4> <y4>200<y4> <x5>100<x5> <y5>200<y5> <x6>50<x6> <y6>150<y6> |
circle |
Circle |
Center coordinates and radius <cx>100<cx> <cy>100<cy> <r>50<r> |
Description of an Annotation File for an Image Classification Dataset
The following description follows the annotation file format for image classification in Table 1.
The image classification dataset supports annotation files in ModelArts image classification 1.0 format.
Labeled objects and their annotation files (in one-to-one relationship with the labeled objects) must be in the same directory. An annotation file in TXT format can contain a single label or multiple labels.
- The image and annotation files must be stored in the same directory, with the content in the annotation file used as the label of the image.
In the following example, import-dir-1 and import-dir-2 are the imported subdirectories.
dataset-import-example ├─import-dir-1 │ 10.jpg │ 10.txt │ 11.jpg │ 11.txt │ 12.jpg │ 12.txt └─import-dir-2 1.jpg 1.txt 2.jpg 2.txt
The following shows an annotation file for a single label, for example, the 1.txt file:
Cat
The following shows an annotation file for multiple labels, for example, the 2.txt file:
Cat Dog
Specifications of Annotation Files in an Anomaly Detection Dataset
The following description follows the annotation file format for anomaly detection in Table 1.
The labeling files and images must be stored in the same folder.
- The image and annotation files must be stored in the same directory, with the content in the annotation file used as the label of the image (normal or abnormal).
dataset-import-example │ IMG_20180919_114732.jpg │ IMG_20180919_114732.txt │ IMG_20180919_114745.jpg │ IMG_20180919_114745.txt
The following shows an annotation file for the "abnormal" label, for example, the IMG_20180919_114732.txt file:
abnormal
The following shows an annotation file for the "normal" label, for example, the IMG_20180919_114745.txt file:
normal
Description of JSON Annotation Files for a Posture Estimation Dataset
The following description follows the annotation file format for post estimation in Table 1.
Posture estimation dataset labeling is based on the open-source character keypoint labeling format (COCO). The annotations, train, and val folders must be included. In the annotations folder, train.json and val.json contain the annotations of the training set and validation set. The train and val folders store images. The following is an example:
├─annotations │ train.json │ val.json ├─train │ IMG_20180919_114745.jpg ├─val │ IMG_20180919_114945.jpg
The following is an example of a JSON annotation file:
{ "images": [ { "license": 2, "file_name": "000000000139.jpg", "coco_url": "", "height": 426, "width": 640, "date_captured": "2013-11-21 01:34:01", "flickr_url": "", "id": 139 } ], "annotations": [ { "num_keypoints": 15, "area": 2913.1104, "iscrowd": 0, "keypoints": [ 427, 170, 1, 429, 169, 2, 0, 0, 0, 434, 168, 2, 0, 0, 0, 441, 177, 2, 446, 177, 2, 437, 200, 2, 430, 206, 2, 430, 220, 2, 420, 215, 2, 445, 226, 2, 452, 223, 2, 447, 260, 2, 454, 257, 2, 455, 290, 2, 459, 286, 2 ], "image_id": 139, "bbox": [ 412.8, 157.61, 53.05, 138.01 ], "category_id": 1, "id": 230831 }, ], "categories": [ { "supercategory": "person", "id": 1, "name": "person", "keypoints": [ "nose", "left_eye", "right_eye", "left_ear", "right_ear", "left_shoulder", "right_shoulder", "left_elbow", "right_elbow", "left_wrist", "right_wrist", "left_hip", "right_hip", "left_knee", "right_knee", "left_ankle", "right_ankle" ], "skeleton": [ [ 16, 14 ], [ 14, 12 ], [ 17, 15 ], [ 15, 13 ], [ 12, 13 ], [ 6, 12 ], [ 7, 13 ], [ 6, 7 ], [ 6, 8 ], [ 7, 9 ], [ 8, 10 ], [ 9, 11 ], [ 2, 3 ], [ 1, 2 ], [ 1, 3 ], [ 2, 4 ], [ 3, 5 ], [ 4, 6 ], [ 5, 7 ] ] } ] }
Field |
Mandatory (Yes/No) |
Description |
---|---|---|
images |
Yes |
Image information. |
license |
No |
License identifier of an image. |
file_name |
Yes |
Image file name. |
coco_url |
No |
URL of an image in the official COCO dataset. |
height |
Yes |
Image height (in pixels). |
width |
Yes |
Image width (in pixels). |
date_captured |
No |
Date and time when an image is captured. |
flickr_url |
No |
URL of an image on the Flickr website. |
id |
Yes |
Unique identifier of an image. |
annotations |
Yes |
Labeling information. |
num_keypoints |
Yes |
Number of labeled key points. |
area |
Yes |
Area of the bounding box, in pixel squares. |
iscrowd |
Yes |
Whether the scenario is a complex group scenario (for example, crowded people). The value 0 indicates that the scenario is not a crowded scenario, and the value 1 indicates that the scenario is a crowded scenario. |
keypoints |
Yes |
Coordinates and visibility of labeled key points. All key points are listed in sequence. Each key point is represented by three numbers: [x, y, v]. x and y are pixel coordinates of the key point, and v is visibility (0: invisible and not in the image; 1: invisible but in the image; 2: visible and in the image). |
image_id |
Yes |
ID of the image associated with the annotation. The value must be the same as the value of id in the images field. |
bbox |
Yes |
Bounding box of the target object, represented by [x, y, width, height], where x and y are the coordinates of the upper left corner of the bounding box, and width and height are the width and height of the bounding box. |
category_id |
Yes |
ID of a label category. For human posture estimation, the value is usually 1 (indicating person). |
id |
Yes |
Unique identifier of an image. |
categories |
Yes |
Label type information. |
supercategory |
Yes |
Upper-level category of a category, which is usually person. |
id |
Yes |
Unique identifier of a category, usually 1 for human posture estimation. |
name |
Yes |
Name of a category, which is usually person. |
keypoints |
Yes |
List of key point names. Generally, 17 key points are defined in the COCO format, such as nose, left_eye, right_eye, left_ear, right_ear, left_shoulder, right_shoulder, left_elbow, right_elbow, left_wrist, right_wrist, left_hip, right_hip, left_knee, right_knee, left_ankle, and right_ankle. |
skeleton |
Yes |
List of skeleton connections, which are used to indicate the connection relationships between key points. Each connection is represented by a pair of key point indexes, for example, [1, 2], indicating a connection line from a nose (nose) to a left eye (left_eye). |
Description of an Annotation File for an Instance Segmentation Dataset
The following description follows the annotation file format for instance segmentation in Table 1.
Labeled objects and their annotation files (in one-to-one relationship with the labeled objects) must be in the same directory. For example, if the name of the labeled object file is IMG_20180919_114745.jpg, the name of the annotation file must be IMG_20180919_114745.xml.
The annotation files must be in PASCAL VOC format, a standardized XML annotation format used for labeling image datasets. A PASCAL_VOC file contains information on the image directory, image file code, image size, and object information. For details about the format, see Table 5.
Example of a file uploaded to OBS:
├─dataset-import-example │ IMG_20180919_114732.jpg │ IMG_20180919_114732.xml │ IMG_20180919_114745.jpg │ IMG_20180919_114745.xml
Example of an XML annotation file:
<annotation> <folder>NA</folder> <filename>0001.jpg</filename> <source> <database>Unknown</database> </source> <size> <width>2560</width> <height>1440</height> <depth>3</depth> </size> <segmented>1</segmented> <mask_source></mask_source> <object> <name>aggregate</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <mask_color>238,130,238</mask_color> <occluded>0</occluded> <polygon> <x1>657.0</x1> <y1>357.0</y1> <x2>645.0</x2> <y2>351.0</y2> <x3>624.0</x3> <y3>352.0</y3> <x4>616.0</x4> <y4>353.0</y4> </polygon> </object> </annotation>
Field |
Mandatory (Yes/No) |
Description |
---|---|---|
folder |
Yes |
Name of the directory where the image is located |
filename |
Yes |
Name of the labeled file |
size |
Yes |
Image pixel
|
segmented |
Yes |
Segmented or not. The value can be 0 or 1. The value 0 means no segmentation, and 1 means segmentation. |
object |
Yes |
Target object information, which includes the category, pose, truncation status, identification difficulty, and bounding box of an object. An image may contain more than one object.
|
type |
Shape |
Labeling Information |
---|---|---|
point |
Point |
Coordinates of a point <x>100<x> <y>100<y> |
line |
Line |
Coordinates of points <x1>100<x1> <y1>100<y1> <x2>200<x2> <y2>200<y2> |
bndbox |
Rectangle |
Coordinates of the upper left and lower right points <xmin>100<xmin> <ymin>100<ymin> <xmax>200<xmax> <ymax>200<ymax> |
polygon |
Polygon |
Coordinates of points <x1>100<x1> <y1>100<y1> <x2>200<x2> <y2>100<y2> <x3>250<x3> <y3>150<y3> <x4>200<x4> <y4>200<y4> <x5>100<x5> <y5>200<y5> <x6>50<x6> <y6>150<y6> |
circle |
Circle |
Center coordinates and radius <cx>100<cx> <cy>100<cy> <r>50<r> |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot