Specifications for Importing Data from an OBS Directory
When importing data from OBS, the data storage directory and file name must comply with the ModelArts specifications.
Only the following labeling types of data can be imported by Labeling Format: image classification, object detection, image segmentation, text classification, and sound classification.
- To import data from an OBS directory, you must have the read permission on the OBS directory.
- The OBS buckets and ModelArts must be in the same region.
Image Classification
Data for image classification can be stored in two formats:
- Images with the same label must be stored in the same directory, with the label name as the directory name. If there are multiple levels of directories, the last level is used as the label name.
In the following example, Cat and Dog are label names.
dataset-import-example ├─Cat │ 10.jpg │ 11.jpg │ 12.jpg │ └─Dog 1.jpg 2.jpg 3.jpg
- The image and labeled file must be stored in the same directory, with the content in the labeled file used as label names.
In the following example, import-dir-1 and import-dir-2 are the imported subdirectories:
dataset-import-example ├─import-dir-1 │ 10.jpg │ 10.txt │ 11.jpg │ 11.txt │ 12.jpg │ 12.txt └─import-dir-2 1.jpg 1.txt 2.jpg 2.txt
The following shows a label file for a single label, for example, the 1.txt file:
Cat
The following shows a label file for multiple labels, for example, the 2.txt file:
Cat Dog
- Only images in JPG, JPEG, PNG, and BMP formats are supported. The size of a single image cannot exceed 5 MB, and the total size of all images uploaded at a time cannot exceed 8 MB.
Object Detection
Data for object detection can be stored in two formats:
Format 1: ModelArts PASCAL VOC 1.0
- The simple mode of object detection requires you to store labeled objects and your label files (in one-to-one relationship with the labeled objects) in the same directory. For example, if the name of the labeled object file is IMG_20180919_114745.jpg, the name of the label file must be IMG_20180919_114745.xml.
The label files must be in PASCAL VOC format. For details about the format, see Table 6.
Example:
├─dataset-import-example │ IMG_20180919_114732.jpg │ IMG_20180919_114732.xml │ IMG_20180919_114745.jpg │ IMG_20180919_114745.xml │ IMG_20180919_114945.jpg │ IMG_20180919_114945.xml
A label file example is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <annotation> <folder>NA</folder> <filename>bike_1_1593531469339.png</filename> <source> <database>Unknown</database> </source> <size> <width>554</width> <height>606</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>Dog</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <occluded>0</occluded> <bndbox> <xmin>279</xmin> <ymin>52</ymin> <xmax>474</xmax> <ymax>278</ymax> </bndbox> </object> <object> <name>Cat</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <occluded>0</occluded> <bndbox> <xmin>279</xmin> <ymin>198</ymin> <xmax>456</xmax> <ymax>421</ymax> </bndbox> </object> </annotation>
- Only images in JPG, JPEG, PNG, and BMP formats are supported. A single image cannot exceed 5 MB, and the total size of all images uploaded at a time cannot exceed 8 MB.
Format 2: YOLO
- A YOLO dataset must comply with the following structure:
└─ yolo_dataset/ │ ├── obj.names # Label set file ├── obj.data # Files and relative paths for recording dataset information ├── train.txt # Relative path of images in the training set ├── valid.txt # Relative path of images in the validation set │ ├── obj_train_data/ # Directory where the images in the training set and the corresponding label files are stored │ ├── image1.txt # BBox label list for image 1 │ ├── image1.jpg │ ├── image2.txt │ ├── image2.jpg │ ├── ... │ ├── obj_valid_data/ # Directory where the images in the validation set and the corresponding label files are stored │ ├── image101.txt │ ├── image101.jpg │ ├── image102.txt │ ├── image102.jpg │ ├── ...
A YOLO dataset supports only training sets and validation sets. If other sets are imported, they will be invalid in the YOLO dataset.
- obj.data contains the following content and at least one of the train and valid subsets must be contained. The file paths are relative paths.
classes = 5 # Optional names = <path/to/obj.names># For example, obj.names train = <path/to/train.txt># For example, train.txt valid = <path/to/valid.txt># Optional, for example, valid.txt backup = backup/ # Optional
- The obj.names file records the label list. Each row label is used as the file index.
label1 # index of label 1: 0 label2 # index of label 2: 1 label3 ...
- The file paths in train.txt and valid.txt are relative paths, and the file list must be in one-to-one relationship with the files in the directories. The file structures of the two files are as follows:
<path/to/image1.jpg># For example, obj_train_data/image.jpg <path/to/image2.jpg># For example, obj_train_data/image.jpg ...
- The .txt files in the obj_train_data/ and obj_valid_data/ directories contain the BBox label information of the corresponding images. Each line indicates a BBox label.
# image1.txt: # <label_index> <x_center> <y_center> <width> <height> 0 0.250000 0.400000 0.300000 0.400000 3 0.600000 0.400000 0.400000 0.266667
x_center, y_center, width, and height indicate the normalized parameters for the target bounding box: the x-coordinate and y-coordinate of the center point, width, and height.
- Only images in JPG, JPEG, PNG, and BMP formats are supported. A single image cannot exceed 5 MB, and the total size of all images uploaded at a time cannot exceed 8 MB.
Text Classification
txt and csv files can be imported for text classification, with the text encoding format of UTF-8 or GBK.
Labeled objects and labels for text classification can be stored in two formats:
- ModelArts text classification combine 1.0: The labeled objects and labels for text classification are in the same text file. You can specify a separator to separate the labeled objects and labels, as well as multiple labels.
For example, the following shows an example text file. The Tab key is used to separate the labeled objects from the labels.
It touches good and responds quickly. I don't know how it performs in the future. positive Three months ago, I bought a very good phone and replaced my old one with it. It can operate longer between charges. positive Why does my phone heat up if I charge it for a while? The volume button stuck after being pressed down. negative It's a gift for Father's Day. The delivery is fast and I received it in 24 hours. I like the earphones because the bass sounds feel good and they would not fall off. positive
- ModelArts text classification 1.0: The labeled objects and labels for text classification are text files, and correspond to each other based on the rows. For example, the first row in a label file indicates the label of the first row in the file of the labeled object.
For example, the content of the labeled object COMMENTS_20180919_114745.txt is as follows:
It touches good and responds quickly. I don't know how it performs in the future. Three months ago, I bought a very good phone and replaced my old one with it. It can operate longer between charges. Why does my phone heat up if I charge it for a while? The volume button stuck after being pressed down. It's a gift for Father's Day. The delivery is fast and I received it in 24 hours. I like the earphones because the bass sounds feel good and they would not fall off.
The content of the label file COMMENTS_20180919_114745_result.txt is as follows:
positive negative negative positive
This data format requires you to store labeled objects and your label files (in one-to-one relationship with the labeled objects) in the same directory. For example, if the name of the labeled object file is COMMENTS_20180919_114745.txt, the name of the label file must be COMMENTS _20180919_114745_result.txt.
Example of data files:
├─dataset-import-example │ COMMENTS_20180919_114732.txt │ COMMENTS _20180919_114732_result.txt │ COMMENTS _20180919_114745.txt │ COMMENTS _20180919_114745_result.txt │ COMMENTS _20180919_114945.txt │ COMMENTS _20180919_114945_result.txt
Sound Classification
ModelArts audio classification dir 1.0: Sound files with the same label must be stored in the same directory, and the label name is the directory name.
Example:
dataset-import-example ├─Cat │ 10.wav │ 11.wav │ 12.wav │ └─Dog 1.wav 2.wav 3.wav
Tables
CSV files can be imported from OBS. Select the directory where the files are stored. The number of columns in the CSV file must be the same as that in the dataset schema. The schema of the CSV file can be automatically obtained.
├─dataset-import-example │ table_import_1.csv │ table_import_2.csv │ table_import_3.csv │ table_import_4.csv
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.