Preparing Data
Before using ModelArts ExeML to build a model, upload data to an OBS bucket. The OBS bucket and ModelArts must be in the same region.
Uploading Data to OBS
There are many restrictions on using the OBS Console, and therefore the OBS client is used to upload data. For more information about how to create a bucket and upload files, see Creating a Bucket and Uploading an Object.
Perform the following operations to import data to the dataset for model training and building.
- Log in to OBS Console and create a bucket in the same region as ModelArts. If an available bucket exists, ensure that the OBS bucket and ModelArts are in the same region.
- Upload the local data to the OBS bucket. If you have a large amount of data, you are advised to use OBS Browser+ to upload data or folders. The uploaded data must meet the dataset requirements of the ExeML project.
Requirements on Datasets
- Files must be in TXT or CSV format, and cannot exceed 8 MB.
- Use line feed characters to separate rows in files, and each row of data represents a labeled object.
- Currently, text classification supports only Chinese.
Requirements for Files Uploaded to OBS
- If you do not need to upload training data in advance, create an empty folder to store files generated in the future.
- If you need to upload files to be labeled in advance, create an empty folder and save the files in the folder. An example of the file directory structure is /bucketName/data/text.csv.
- A label name can contain a maximum of 32 characters, including Chinese characters, uppercase and lowercase letters, digits, hyphens (-), and underscores (_).
- If you want to upload labeled text files to the OBS bucket, upload them according to the following specifications:
- The dataset requires storing labeled objects and their label files (in one-to-one relationship with the labeled objects) in the same directory. For example, if the name of the labeled object file is COMMENTS_20180919_114745.txt, the name of the label file must be COMMENTS _20180919_114745_result.txt.
Example of data files:
├─<dataset-import-path> │ COMMENTS_20180919_114732.txt │ COMMENTS _20180919_114732_result.txt │ COMMENTS _20180919_114745.txt │ COMMENTS _20180919_114745_result.txt │ COMMENTS _20180919_114945.txt │ COMMENTS _20180919_114945_result.txt - The labeled objects and label files for text classification are text files, and correspond to each other based on rows. For example, the first row in a label file indicates the label of the first row in the labeled object.
For example, the content of labeled object COMMENTS_20180919_114745.txt is as follows:
It touches good and responds quickly. I don't know how it performs in the future. Three months ago, I bought a very good phone and replaced my old one with it. It can operate longer between charges. Why does my phone heat up if I charge it for a while? The volume button stuck after being pressed down. It's a gift for Father's Day. The logistics is fast and I received it in 24 hours. I like the earphones because the bass sounds feel good and they would not fall off.
The content of label file COMMENTS_20180919_114745_result.txt is as follows:
positive negative negative positive
- The dataset requires storing labeled objects and their label files (in one-to-one relationship with the labeled objects) in the same directory. For example, if the name of the labeled object file is COMMENTS_20180919_114745.txt, the name of the label file must be COMMENTS _20180919_114745_result.txt.
Last Article: Text Classification
Next Article: Creating a Project
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.