Preparing Data

Before using ModelArts ExeML to build a model, upload data to an OBS bucket. The OBS bucket and ModelArts must be in the same region.

Uploading Data to OBS

This operation uses the OBS client to upload data. For more information about how to create a bucket and upload files, see Creating a Bucket and Uploading an Object.

Perform the following operations to import data to the dataset for model training and building.

  1. Log in to OBS Console and create a bucket in the same region as ModelArts. If an available bucket exists, ensure that the OBS bucket and ModelArts are in the same region.
  2. Upload the local data to the OBS bucket. If you have a large amount of data, you are advised to use OBS Browser+ to upload data or folders. The uploaded data must meet the dataset requirements of the ExeML project.

Requirements for Sound Classification Data

  • Only 16-bit WAV files are supported. All subformats of WAV are supported.
  • The duration of a sound file must be longer than 1 second, and the maximum size of a sound file is 4 MB.
  • Add more sound files to a training set, improving model precision. Prepare at least 50 sound files for each class, and the total length of each class of sound files must be at least 5 minutes.
  • Ensure that the sound files are authentic, and that each class of sound files covers all application scenarios in the real world.
  • The quality of the training set has a great impact on the precision of the model. It is recommended that the sampling rate and precision of the training set be the same.
  • The labeling quality has a great impact on the model precision. Do not mislabel objects.

Requirements for Files Uploaded to OBS

  • If you do not need to upload training data in advance, create an empty folder to store files generated in the future, for example, /bucketName/data-cat.
  • If you need to upload sound files to be labeled in advance, create an empty folder and save the sound files in the folder. An example of the file directory structure is /bucketName/data-cat/cat.wav.