Updated on 2025-07-02 GMT+08:00

Format Requirements for Audio Datasets

ModelArts Studio supports the creation of audio datasets. During the creation, you can import data in various formats. Table 1 lists the format requirements.

Table 1 Format requirements for audio datasets

File Content

File Format

Requirement

Audio Only

Audio

  • Supported formats: mp3, flac, wav, opus, aac, and m4a. All audio files can be stored in multiple folders. Each folder can contain audio files in different formats.
  • Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.

Audio + Annotation

Audio + JSONL

  • Supported audio formats: mp3, flac, wav, opus, aac, and m4a

    Annotation file format: JSONL

  • Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.

The following is an example.

For details about the annotation file in JSONL format, refer to the following:

{"audio_name":"dir/16k_16bit_1channel_2s.flac","caption":"1"}
{"audio_name":"dir/16k_16bit_1channel_2s.mp3","caption":"2"}
{"audio_name":"dir/16k_16bit_1channel_2s.opus","caption":"3"}
{"audio_name":"dir/16k_16bit_1channel_2s.wav","caption":"4"}