Updated on 2025-07-02 GMT+08:00

Video Dataset Processing Operators

The data processing operators provide multiple data operation capabilities, including data extraction, filtering, conversion, and labeling. These operators help you extract useful information from massive data and perform deep processing to generate high-quality training data.

The platform supports processing of video datasets, including data extraction, data filtering, and data labeling. Table 1 lists the capabilities of video dataset processing operators.

Table 1 Video dataset processing operator capabilities

Category

Operator Name

Operator Description

Data extraction

Scene splitting

Splits a long video into short video clips based on the scene change. If the length of a clip exceeds the specified time threshold, the clip is further split by duration.

Data conversion

Adding Watermarks

Adds a full-screen text watermark to a video.

Video Cropping

Video cropping is to crop unnecessary elements in a video, such as subtitles, logos, watermarks, borders, and dense text, and filter out video files whose area ratio after cropping exceeds the preset threshold. Before using this function, you need to execute the subtitle, logo, watermark, border, and dense text recognition operators.

Data filtering

Video Metadata Filtering

Filters videos based on the video metadata (frame rate, resolution, and video duration) and retains only the videos that meet the specified conditions. Note: The standard frame rate of a movie is 24 FPS or 30 FPS.

Video Aspect Ratio Filtering

Filters videos based on the aspect ratio. The aspect ratio is a ratio of a width to a height of a video image.

Data labeling

Pornographic Video Detection

Labels pornographic content.

Violent and Terrorism Video Detection

Labels violent and terrorism content.

Political Video Content Detection and Scoring

Labels political content.

Motion Range Scoring

Calculates and scores the motion range of each pixel in each frame, and identifies videos with too fast motion (for example, > 100 optical flows) or too slow motion (for example, ≤ 2 optical flows). A larger value indicates faster motion.

Quality score

Scores the basic video quality (definition, brightness, blur, image shaking and ghosting, overexposure in low light, and glitch). The value range is (0, 1). A higher value indicates better quality. A video whose score is greater than 0.05 is considered as a video with high basic quality.

Aesthetics Scoring

Scores the aesthetics of a video from the following dimensions: content (attractive and clear), composition (good object position), color (vital and pleasant), light (obvious contrast), and track (continuous and stable). The value range is (0, 1). A higher value indicates better aesthetics. A video whose score is greater than 0.95 is considered as a video with high aesthetics.

Watermark Identification

Identifies whether a video contains watermarks.

Subtitle Identification

Identifies whether a video contains subtitles.

Logo Identification

Identifies whether a video contains a logo.

Black Bar Identification

Checks whether a video contains black bars.

Dense Text Identification

Identifies whether a video contains dense text. A video in which the proportion of dense text area exceeds the specified proportion is a video with dense text. By default, a video with a cropping area proportion greater than or equal to 7% is a video with dense text.

Video classification (InterVideo2)

Returns video label categories through the operator. There are seven categories at L1 and 700 categories at L4.

Video Synopsis Generation (Simplified)

Extracts frames from a video and generates a simplified video synopsis through model inference.

Video Synopsis Generation (Detailed)

Extracts frames from a video and generates a detailed video synopsis through model inference.

Scene Splitting

  • Applicable file format: video > mp4/avi.
  • Parameter description:

    Video to be split: Videos that meet the resolution, duration, and frame rate criteria are split.

    Specifications after video splitting: The maximum duration of a single video slice can be customized. If the duration of the first split slice exceeds the specified value, the video will be further split. The final split result is less than or equal to the specified threshold.

Adding Watermarks

  • Applicable file format: video > mp4/avi.
  • Parameter description:

    Watermark text: string-type text to be added to the full screen

Video Cropping

  • Applicable file format: video > mp4/avi.
  • Parameter description:

    Items to be cropped: Remove useless information such as subtitles, logos, watermarks, borders, and dense text from videos.

    Cropping area ratio: The ratio of the cropped video area to the original video area is the cropping area ratio. Videos whose cropping area ratio exceeds the preset threshold will be filtered out. The default value is 30%.

Video Metadata Filtering

  • Applicable file format: video > mp4/avi.

Video Aspect Ratio Filtering

  • Applicable file format: video > mp4/avi.
  • Parameter description:

    Aspect ratio threshold: Videos whose aspect ratio exceeds the threshold will be filtered out. The threshold range is (1, 10). You can enter one decimal place.

  • Example:

    Original video dataset:

    There are two videos, and their respective aspect ratios are 1.77 and 1.79.

    Set the aspect ratio threshold to 1.78. After operator processing, the result is as follows.

    Only the video whose aspect ratio is 1.79 is retained.

Pornographic Video Detection

  • Applicable file format: video > mp4/avi.
  • Operator description: Labels pornographic content.
  • Parameter configuration example:

    No parameters need to be set.

  • Detection example:

    The results are stored in the annotation file as the video_anti_porn object.

    suggestion: indicates whether the file passes the check. pass indicates that the file passes the check and no problem occurs. review indicates that manual review is required. You can choose to bypass or block the file based on your review policy. block indicates that the file to be reviewed is problematic.

    confidence: detection confidence of the model. (Note that the confidence indicates the confidence of the model-provided suggestions.) If suggestion is pass, the value is 0. If suggestion is review or block, the value ranges from 0 to 1.

    label: label of the pornographic content detected by the model. If no pornographic content is detected, the value is empty.

Violent and Terrorism Video Detection

  • Applicable file format: video > mp4/avi.
  • Operator description: Labels violent and terrorism content.
  • Parameter configuration example:

    No parameters need to be set.

  • Detection example: The results are stored in the annotation file as the video_anti_terrorism object.

    suggestion: indicates whether the file passes the check. pass indicates that the file passes the check and no problem occurs. review indicates that manual review is required. You can choose to bypass or block the file based on your review policy. block indicates that the file to be reviewed is problematic.

    confidence: detection confidence of the model. (Note that the confidence indicates the confidence of the model-provided suggestions.) If suggestion is pass, the value is 0. If suggestion is review or block, the value ranges from 0 to 1.

    label: label of the violent and terrorism content detected by the model. If no violent or terrorism content is detected, the value is empty.

Political Video Content Detection and Scoring

  • Applicable file format: video > mp4/avi.
  • Operator description:

    Labels political content.

  • Parameter configuration example:

    No parameters need to be set.

  • Detection example:

    The results are stored in the annotation file as the video_anti_politics object.

    suggestion: indicates whether the file passes the check. pass indicates that the file passes the check and no problem occurs. review indicates that manual review is required. You can choose to bypass or block the file based on your review policy. block indicates that the file to be reviewed is problematic.

    result: result returned by the model after file detection, including the suggestion, confidence, and label. One or more records can be returned.

    confidence: detection confidence of the model. (Note that the confidence indicates the confidence of the model-provided suggestions.) If suggestion is pass, the value is 0. If suggestion is review or block, the value ranges from 0 to 1.

    label: label of the political content detected by the model. If no political content is detected, the value is empty.

Motion Range Scoring

  • Applicable file format: video > mp4/avi.
  • Scoring description:

    Identifies videos with too fast or too slow motion. A larger value indicates faster motion. If the motion range is greater than 100 optical flows, the motion is too fast. If the motion range is less than or equal to 2 optical flows, the motion is too slow.

  • Scoring example: The motion range score is displayed in the JSONL file.

Basic Quality Scoring

  • Applicable file format: video > mp4/avi.
  • Scoring description:

    Scores the basic video quality (definition, brightness, blur, image shaking and ghosting, overexposure in low light, and glitch). The value range is (0, 1). A higher value indicates better quality. A video whose score is greater than 0.05 is considered as a video with high basic quality.

  • Scoring example: The quality scores are stored in a JSONL file as the clip_quality_value object.

Aesthetics Scoring

  • Applicable file format: video > mp4/avi.
  • Scoring description:

    Scores the aesthetics of a video from the following dimensions: content (attractive and clear), composition (good object position), color (vital and pleasant), light (obvious contrast), and track (continuous and stable). The value range is (0, 1). A higher value indicates better aesthetics. A video whose score is greater than 0.95 is considered as a video with high aesthetics.

  • Scoring example: The aesthetics scores are stored in a JSONL file as the clip_esthetics_value object.

Watermark Identification

  • Applicable file format: video > mp4/avi.
  • Operator description:

    Identifies whether a video contains watermarks.

  • Example: The JSONL file shows whether the watermark is identified. If the value of consist_watermark is 1, the watermark is identified. If the value is 0, no watermark is identified.

Subtitle Identification

  • Applicable file format: video > mp4/avi.
  • Operator description:

    Identifies whether a video contains subtitles.

  • Example: The JSONL file shows whether the subtitle is identified. If the value of consist_subtitle is 1, the subtitle is identified. If the value is 0, no subtitle is identified.

Logo Identification

  • Applicable file format: video > mp4/avi.
  • Operator description:

    Identifies whether a video contains a logo.

  • Example: The JSONL file shows whether the logo is identified. If the value of consist_logo is 1, the logo is identified. If the value is 0, no logo is identified.

Black Bar Identification

  • Applicable file format: video > mp4/avi.
  • Operator description:

    Identifies whether a video contains black bars.

  • Example: If border_value is 1, black bars are identified. If border_value is 0, black bars are not identified.

Dense Text Identification

  • Applicable file format: video > mp4/avi.
  • Parameter description:

    Proportion of dense text area: A video in which the proportion of dense text area exceeds the specified proportion is a video with dense text. By default, a video with a cropping area proportion greater than or equal to 7% is a video with dense text.

  • Example: In the JSONL file, if the value of consist_densetext is 1, dense text is identified. If the value is 0, dense text is not identified.

Video Classification (InterVideo2)

  • Applicable file format: video > mp4/avi.
  • Operator description:

    Automatically classifies short video content and generates corresponding tags.

  • Parameter configuration example:

    No parameter configuration required

  • Example of category labeling:

    The following information is displayed in the description:

    category_L1_cn: first-level category

    category_L4_cn: fourth-level category

Video Synopsis Generation (Detailed)

  • Applicable file format: video > mp4/avi.
  • Operator description:

    Extracts frames from a video and generates a detailed video synopsis through model inference.

  • Parameter configuration example:

    No parameter configuration is required.

  • Example: The long_prompt field in the description indicates the detailed video synopsis.

Video Synopsis Generation (Simplified)

  • Applicable file format: video > mp4/avi.
  • Operator description:

    Extracts frames from a video and generates a simplified video synopsis through model inference.

  • Parameter configuration example:

    No parameter configuration is required.

  • Example: The prompt field in the description indicates the simplified video synopsis.
    Figure 1 Example