Format Requirements for Video Datasets

ModelArts Studio supports the creation of video datasets. During the creation, you can import data in various formats. Table 1 lists the format requirements.

**Table 1** Format requirements for video datasets
File Content	File Format	Requirement
Video	MP4 or AVI	Videos in MP4 and AVI formats can be uploaded. Videos can be stored in multiple folders, and each folder can contain videos in MP4 or AVI format. Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited.
Video + Annotation	Video + JSON	The video format can be MP4 and AVI. Annotation files must be in JSONL format. The encoding format can only be UTF-8. The following is an example. For details about the annotation file in JSONL format, refer to the following: {"video_fn": "13/ad098173-af09-48fe-95c3-e72fd629688e.mp4" Relative path of the video, "prompt": "A person pours a clear liquid from a bottle into a shot glass, then lifts the glass to their mouth and drinks the shot. The background includes a red coat and other indistinct background elements." Video synopsis generation (simplified), "long_prompt": "A person is seen pouring a clear liquid from a green glass bottle into a small glass. The individual is wearing a white shirt with a lace collar and a beige cardigan. The background appears to be a cozy indoor setting, possibly a cafe or a restaurant, with red and white elements visible, such as a red coat hanging on the wall and a white table. The person carefully pours the liquid, ensuring it is filled to the brim of the glass. The liquid is clear and has some green leaves floating in it. The person then holds the glass up, possibly to show the contents or to prepare for a drink.", Video synopsis generation (detailed) }
Event detection	Video + JSON	Data source samples must be in AVI or MP4 format, and annotation files must be in JSON format. The data source must contain two or more files in AVI or MP4 format. One video file can correspond to one or more annotation files. The duration of each video must be greater than 128s, the FPS must be greater than or equal to 10, and the test set and training set must contain videos. Supported video formats include MP4 and AVI. The duration of each video must be greater than 128s, and the FPS must be greater than or equal to 10. The annotation.json file is the annotation file. Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited. The following is an example. Video + JSON (many-to-one) Video + JSON (one-on-one) For details about the annotation file in JSON format, refer to the following: { 'version': 'dataset_name_v.x.x',// Dataset version information. 'classes': [category1',category2', ...],// List of all category names. Each category corresponds to a label, which is used to mark events or actions in the video. 'database': { 'video_name':{ // Training set: train; test set: test. 'subset': 'train', 'duration': 1660.3, // Total video duration, in seconds. 'fps': 30.0,// Video frame rate. 'width': 720,// Video width, in pixels. 'height': 1280,// Video height, in pixels. 'ext': 'mp4',//Video file name extension. // 34.5 and 42.4 indicate the start time and end time, respectively. The unit is second. // label indicates the category. It must be an element in the classes list, indicating the event or action type corresponding to the video clip. 'annotations': [ {'label': 'category1', 'segment': [34.5, 42.4]}, {'label': 'category1', 'segment': [124.4, 142.9]}, ... ] }, 'video_name':{ 'subset': xxx,// Video file name, excluding the file name extension. 'duration': xxx, 'fps': xxx, 'width': xxx, 'height': xxx, 'ext': xxx, 'annotations': [ {'label': xxx, 'segment': xxx}, {'label': xxx, 'segment': xxx}, ... ] }, ... } }
Video classification	Video + TXT	Import from OBS: The size of a single file cannot exceed 50 GB, and the number of files is not limited. The video format can be MP4 or AVI, and the annotation file format must be TXT. Each video corresponds to an annotation file. The following is an example.