Updated on 2024-04-01 GMT+08:00

Preparing Data

Before using ModelArts ExeML to build a model, upload data to an OBS bucket. The OBS bucket and ModelArts must be in the same region.

Uploading Data to OBS

This operation uses the OBS client to upload data. For more information about how to create a bucket and upload files, see Creating a Bucket and Uploading an Object.

Perform the following operations to import data to the dataset for model training and building.

  1. Log in to OBS Console and create a bucket in the same region as ModelArts. If an existing bucket is available, ensure that the OBS bucket and ModelArts are in the same region.
  2. Upload a file to the OBS bucket. If you have a large amount of data, use OBS Browser+ to upload data or folders. The uploaded data must meet the dataset requirements of the ExeML project.

    Upload data from unencrypted buckets. Otherwise, training will fail because data cannot be decrypted.

Requirements for Sound Classification Data

  • Only 16-bit WAV files are supported. All sub-formats of WAV are supported.
  • The duration of a sound file must be longer than 1 second, and the maximum size of a sound file is 4 MB.
  • Add more sound files to a training set, improving model precision. Prepare at least 50 sound files for each class, and the total length of each class of sound files must be at least 5 minutes.
  • Ensure that the sound files are authentic, and that each class of sound files covers all application scenarios in the real world.
  • The quality of the training set has a great impact on the precision of the model. It is recommended that the sampling rate and precision of the training set be the same.
  • The labeling quality has a great impact on the model precision. Do not mislabel objects.
  • Only Chinese and English are supported for audio labeling.

Requirements for Files Uploaded to OBS

  • If you do not need to upload training data in advance, create an empty folder to store files generated in the future, for example, /bucketName/data-cat.
  • If you need to upload sound files to be labeled in advance, create an empty folder and save the sound files in the folder. An example of the file directory structure is /bucketName/data-cat/cat.wav.