使用数据工程构建视频生成大模型数据集
盘古视频生成大模型支持接入视频类数据集,不同场景所需数据见表1 训练视频生成大模型数据集类型要求,数据集格式要求请参见视频类数据集格式要求。
基模型 |
训练场景 |
文件类型 |
文件内容 |
文件格式 |
文件样例 |
---|---|---|---|---|---|
多模态视频生成大模型 |
预训练 |
视频 |
视频+prompt |
视频+jsonl文件 |
视频存储在于jsonl并列的文件夹中,视频描述文本的类型为jsonl,示例如下: {"video_fn": "dir/001.mp4", "prompt": "A person pours a clear liquid from a bottle into a shot glass, then lifts the glass to their mouth and drinks the shot. The background includes a red coat and other indistinct background elements.", "long_prompt": "A person is seen pouring a clear liquid from a green glass bottle into a small glass. The individual is wearing a white shirt with a lace collar and a beige cardigan. The background appears to be a cozy indoor setting, possibly a cafe or a restaurant, with red and white elements visible, such as a red coat hanging on the wall and a white table. The person carefully pours the liquid, ensuring it is filled to the brim of the glass. The liquid is clear and has some green leaves floating in it. The person then holds the glass up, possibly to show the contents or to prepare for a drink.", "4_vae_feature_shape": [16, 32, 90, 160], "4_vae_feature_length": 16}
|
微调 |
视频 |
视频+prompt |
视频+jsonl文件 |
图片以tar包格式存储,图片描述文本的类型为jsonl,示例如下: {"video_fn": "dir/001.mp4", "prompt": "A person pours a clear liquid from a bottle into a shot glass, then lifts the glass to their mouth and drinks the shot. The background includes a red coat and other indistinct background elements.", "long_prompt": "A person is seen pouring a clear liquid from a green glass bottle into a small glass. The individual is wearing a white shirt with a lace collar and a beige cardigan. The background appears to be a cozy indoor setting, possibly a cafe or a restaurant, with red and white elements visible, such as a red coat hanging on the wall and a white table. The person carefully pours the liquid, ensuring it is filled to the brim of the glass. The liquid is clear and has some green leaves floating in it. The person then holds the glass up, possibly to show the contents or to prepare for a drink.", "4_vae_feature_shape": [16, 32, 90, 160], "4_vae_feature_length": 16}
|
|
蒸馏 |
二进制 |
二进制+jsonl |
二进制+jsonl文件 |
二进制文件以bin文件格式存储,描述文本的类型为jsonl,示例如下: {"vae_fn": "vae_feat/video_latent_720p_5mhengping/13/ad098173-af09-48fe-95c3-e72fd629688e.bin", "t5_fn": "t5_feat/13/ad098173-af09-48fe-95c3-e72fd629688e.bin", "4_vae_feature_shape": [16, 32, 90, 160], "4_vae_feature_length": 16, "image_fn": "vae_feat/first_frame_latent_720p_5mhengping/13/ad098173-af09-48fe-95c3-e72fd629688e.bin", "image2_fn": "vae_feat/last_frame_latent_720p_5mhengping/13/ad098173-af09-48fe-95c3-e72fd629688e.bin", "subvideo_fn": "vae_feat/continue_latent_720p_5mhengping/13/ad098173-af09-48fe-95c3-e72fd629688e.bin"} vae_fn:视频特征文件路径 t5_fn:文本特征文件路径 image_fn:视频首帧特征文件路径 image2_fn:视频尾帧特征文件路径 subvideo_fn:视频续写特征文件路径 |