Preparing Image Classification Data
Before using ModelArts ExeML to build a model, upload data to an OBS bucket. The OBS bucket and ModelArts must be in the same region.
Requirements on Datasets
- Check that all images are undamaged and in a compatible format. The supported formats are JPG, JPEG, BMP, and PNG.
- Do not store data of different projects in the same dataset.
- Collect at least two classes of images with a similar number of images in each class. Make sure each class has a minimum of 20 images.
- To ensure the prediction accuracy of models, the training samples must be similar to the real-world use cases.
- To ensure the generalization capability of models, datasets should cover all possible scenarios.
Uploading Data to OBS
In this section, the OBS console is used to upload data.
- The name of files cannot contain plus signs (+), spaces, or tabs.
- If you do not need to upload training data in advance, create an empty folder to store files generated in the future, for example, /bucketName/data-cat.
- If you need to upload images to be labeled in advance, create an empty folder and save the images in the folder. An example of the image directory structure is /bucketName/data-cat/cat.jpg.
- If you want to upload labeled images to the OBS bucket, upload them according to the following specifications:
- The dataset for image classification requires storing labeled objects and their label files (in one-to-one relationship with the labeled objects) in the same directory. For example, if the name of the labeled object is 10.jpg, the name of the label file must be 10.txt.
Example of data files:
├─<dataset-import-path> │ 10.jpg │ 10.txt │ 11.jpg │ 11.txt │ 12.jpg │ 12.txt
- Only images in JPG, JPEG, PNG, and BMP formats are supported. When uploading images on the OBS console, ensure that the size of an image does not exceed 5 MB and the total size of images to be uploaded in one attempt does not exceed 8 MB. If the data volume is large, use OBS Browser+ to upload images.
- A label name can contain a maximum of 32 characters, including letters, digits, hyphens (-), and underscores (_).
- The specifications of image classification label files (.txt) are as follows:
Each row contains only one label.
flower book ...
- The dataset for image classification requires storing labeled objects and their label files (in one-to-one relationship with the labeled objects) in the same directory. For example, if the name of the labeled object is 10.jpg, the name of the label file must be 10.txt.
Procedure for uploading data to OBS:
Perform the following operations to upload data to OBS for model training and building.
- Log in to the OBS console and create a bucket in the same region as ModelArts. If an available bucket exists, ensure that the OBS bucket and ModelArts are in the same region.
- Upload the local data to the OBS bucket. If you have a large amount of data, use OBS Browser+ to upload data or folders. The uploaded data must meet the dataset requirements of the ExeML project.
Upload data from unencrypted buckets. Otherwise, training will fail because data cannot be decrypted.
Creating a Dataset
After data is prepared, create a dataset of the type supported by the project. For details, see Creating a Dataset.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot