Updated on 2025-02-10 GMT+08:00

Creating a Video Script

Function

Creates a video script.

Calling Method

For details, see Calling APIs.

URI

POST /v1/{project_id}/digital-human-video-scripts

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Request Parameters

Table 2 Request header parameters

Parameter

Mandatory

Type

Description

X-Auth-Token

No

String

User token. This parameter is mandatory when token authentication is used.

You can obtain the token by calling the IAM API used to obtain a user token.

Value of X-Subject-Token in the response header.

Authorization

No

String

Authentication information. This parameter is mandatory for AK/SK authentication.

X-Sdk-Date

No

String

Time when the request is sent. This parameter is mandatory for AK/SK authentication.

The format is YYYYMMDD'T'HHMMSS'Z'.

X-Project-Id

No

String

Project ID. This parameter is mandatory for AK/SK authentication.

X-App-UserId

No

String

Third-party user ID, which does not allow Chinese characters.

Table 3 Request body parameters

Parameter

Mandatory

Type

Description

script_name

Yes

String

Details:

Script name.

Constraints:

N/A

Options:

The value contains 1 to 256 characters.

Default value:

N/A

script_description

No

String

Details:

Script description.

Constraints:

N/A

Options:

The value contains 0 to 1,024 characters.

Default value:

N/A

view_mode

No

String

Details:

Landscape or portrait mode.

Constraints:

N/A

Options:

  • LANDSCAPE: landscape
  • VERTICAL: portrait

Default value:

LANDSCAPE

model_asset_id

No

String

Details:

Virtual human model asset ID.

Constraints:

N/A

Options:

The value contains 0 to 64 characters.

Default value:

N/A

model_asset_type

No

String

Details:

Virtual human model type.

Constraints:

N/A

Options:

  • HUMAN_MODEL_2D: virtual avatar
  • HUMAN_MODEL_3D: 3D virtual human

Default value:

N/A

voice_config

No

VoiceConfig object

Voice configuration parameter.

video_config

No

VideoConfig object

Video output configuration.

scene_asset_id

No

String

Details:

Scene asset ID.

Constraints:

This parameter is not required for virtual avatar video production.

Options:

The value contains 0 to 64 characters.

Default value:

N/A

priv_data

No

String

Details:

Private data, which is entered by the user and then returned with the same content.

Constraints:

N/A

Options:

The value contains 0 to 8,192 characters.

Default value:

N/A

background_music_config

No

BackgroundMusicConfig object

Background music configuration.

NOTE:

Background music can be set only for virtual avatar video production, but not for 3D virtual human video production.

review_config

No

ReviewConfig object

Content review configuration.

shoot_scripts

Yes

Array of ShootScriptItem objects

Video shooting scripts.

Table 4 VoiceConfig

Parameter

Mandatory

Type

Description

voice_asset_id

Yes

String

Details:

Timbre asset ID, which can be queried from the asset library.

Constraints:

N/A

Options:

The value contains 1 to 256 characters.

Default value:

N/A

speed

No

Integer

Details:

Speaking speed. 50 indicates 0.5x speaking speed, 100 indicates normal speaking speed, and 200 indicates 2x speaking speed.

The value 100 indicates the normal speaking speed of an adult, which is about 150 words per minute.

Constraints:

N/A

Value range:

50-200

Default value:

100

pitch

No

Integer

Details:

Pitch.

Constraints:

N/A

Value range:

50-200

Default value:

100

volume

No

Integer

Details:

Volume.

Constraints:

N/A

Value range:

90-240

Default value:

140

Table 5 VideoConfig

Parameter

Mandatory

Type

Description

clip_mode

No

String

Details:

Clipping mode of the output video.

Constraints:

N/A

Options:

  • RESIZE: video scaling
  • CROP: video cropping

Default value:

RESIZE

codec

Yes

String

Details:

Video encoding format and video file format.

Constraints:

Only virtual avatar video production supports VP8 encoding.

Options:

  • H264: H.264 encoding, MP4 file output
  • VP8: VP8 encoding, WebM file output

Default value:

N/A

bitrate

Yes

Integer

Details:

Average output bitrate. Unit: kbit/s

Constraints:

  • Quality is prioritized for virtual avatar video production, which may exceed the preset bitrate.
  • Bitrate range for virtual avatar video production: [1000, 8000].

Default value:

N/A

Value range:

40-30000

width

Yes

Integer

Details:

Video width. Unit: pixel.

Constraints:

  • When clip_mode is set to RESIZE, the following resolutions are supported: 1920 x 1080, 1080 x 1920, 1280 x 720, 720 x 1280, 3840 x 2160, and 2160 x 3840. 4K is available only when the virtual avatar model supports 4K.
  • When clip_mode is set to CROP, (dx, dy) is the origin, and the width is the actual width of the reserved video.
  • Currently, only 1080 x 1920 and 1920 x 1080 are supported for virtual avatar livestreaming.

Default value:

N/A

Value range:

0-3840

height

Yes

Integer

Details:

Video height.

Unit: pixel.

Constraints:

  • When clip_mode is set to RESIZE, the following resolutions are supported: 1920 x 1080, 1080 x 1920, 1280 x 720, 720 x 1280, 3840 x 2160, and 2160 x 3840.
  • When clip_mode is set to CROP, (dx, dy) is the origin, and the height is the actual height of the reserved video.
  • Currently, only 1080 x 1920 and 1920 x 1080 are supported for virtual avatar livestreaming.

Default value:

N/A

Value range:

0-3840

frame_rate

No

String

Details:

Frame rate. Unit: FPS

Constraints:

The virtual avatar video frame rate is fixed at 25 FPS.

Default value:

25

is_subtitle_enable

No

Boolean

Details:

Whether the output video is subtitled.

Constraints:

Subtitles are not supported for virtual avatar livestreaming.

Options:

  • true: enable subtitling
  • false: disable subtitling

Default value:

false

subtitle_config

No

SubtitleConfig object

Subtitle configuration.

dx

No

Integer

Details:

Horizontal coordinate of the pixel in the upper left corner of the cropped video.

NOTE:

The image layout size is based on the model resolution. For example, for a model with the resolution of 1920 x 1080, the value of dx ranges from 0 to 1920.

Constraints:

This parameter takes effect when clip_mode is set to CROP.

Default value:

N/A

Value range:

-1920-3840

dy

No

Integer

Details:

Vertical coordinate of the pixel in the upper left corner of the cropped video.

NOTE:

The image layout size is based on the model resolution. For example, for a model with the resolution of 1920 x 1080, the value of dy ranges from 0 to 1080.

Constraints:

This parameter takes effect when clip_mode is set to CROP.

Default value:

N/A

Value range:

-1920-3840

is_enable_super_resolution

No

Boolean

Details:

Whether super resolution is enabled for a video.

Constraints:

This parameter is available only for virtual avatar video production.

Options:

  • true: enable
  • false: do not enable

Default value:

false

Table 6 SubtitleConfig

Parameter

Mandatory

Type

Description

dx

No

Integer

Details:

Coordinates of the pixel in the lower left corner of the subtitle box.

Constraints:

N/A

Default value:

N/A

Value range:

0-1920

dy

No

Integer

Details:

Coordinates of the pixel in the lower left corner of the subtitle box.

Constraints:

N/A

Default value:

N/A

Value range:

0-1920

font_name

No

String

Details:

Font. The following fonts are supported:

  • HarmonyOS_Sans_SC_Black: HarmonyOS bold
  • HarmonyOS_Sans_SC_Regular: HarmonyOS normal
  • HarmonyOS_Sans_SC_Thin: HarmonyOS light

Constraints:

N/A

Options:

The value contains 0 to 64 characters.

Default value:

HarmonyOS_Sans_SC_Black

font_size

No

Integer

Details:

Font size. The interface value ranges from 0 to 120. The actual value range is 4 to 120. Use the actual value range.

Constraints:

N/A

Value range:

0-120

Default value:

54

h

No

Integer

Details:

Subtitle box height.

Constraints:

The parameter h is used to facilitate the calculation of the coordinates in the upper left corner of the subtitle box. This parameter is not used in the background.

Value range:

0-1920

w

No

Integer

Details:

Subtitle box width.

Constraints:

  • The subtitle box width is fixed at 80/ %of the screen width.
  • The parameter w is used to facilitate the calculation of the coordinates in the upper left corner of the subtitle box. This parameter is not used in the background.

Value range:

0-1920

Table 7 BackgroundMusicConfig

Parameter

Mandatory

Type

Description

music_asset_id

No

String

Details:

Music asset ID.

Constraints:

N/A

Options:

The value contains 0 to 64 characters.

Default value:

N/A

volume

No

Integer

Details:

Music volume. For example, 100 indicates that the volume is 100%, and 50 indicates that the volume is 50%.

Constraints:

N/A

Value range:

0-100

Default value:

100

Table 8 ReviewConfig

Parameter

Mandatory

Type

Description

no_need_review

No

Boolean

Content review whitelist. This feature is available only for users in the whitelist. The auto review policies apply to other users.

Table 9 ShootScriptItem

Parameter

Mandatory

Type

Description

sequence_no

No

Integer

Details:

Script No.

Constraints:

The sequence number of a script must be unique.

Default value:

N/A

Value range:

0-2147483647

shoot_script

Yes

ShootScript object

Performance script.

subtitle_file_info

No

SubtitleFiles object

Subtitle file information.

Table 10 ShootScript

Parameter

Mandatory

Type

Description

script_type

No

String

Details:

Script type, that is, the control mode of video production.

Constraints:

N/A

Options:

TEXT: text control, that is, using TTS

AUDIO: speech control

Default value:

TEXT

text_config

No

TextConfig object

Commentary configuration.

audio_drive_action_config

No

Array of AudioDriveActionConfig objects

Action configuration for speech control.

background_config

No

Array of BackgroundConfigInfo objects

Background configuration.

layer_config

No

Array of LayerConfig objects

Layer configuration.

Table 11 TextConfig

Parameter

Mandatory

Type

Description

text

Yes

String

Details:

Script. Two modes are supported: plain text mode and tag mode.

  • Plain text mode, for example, "Hello, everyone, I'm a virtual streamer."
  • Tag mode: For details about the definition of SSML tags, see SSML Definition of Text Control.

Constraints:

The value can contain a maximum of 10,000 characters, excluding the SSML tag.

Options:

The value contains 0 to 131,072 characters.

Default value:

N/A

Table 12 AudioDriveActionConfig

Parameter

Mandatory

Type

Description

action_tag

Yes

String

Action tag

action_name

No

String

Action name

action_start_time

Yes

Float

Action start time

Value range:

0-2592000

Table 13 BackgroundConfigInfo

Parameter

Mandatory

Type

Description

background_type

Yes

String

Details:

Background type.

Constraints:

N/A

Options:

  • IMAGE: image background, which is used as the virtual avatar video background
  • COLOR: solid color background. The RGB value of the specified color is used as the virtual avatar video background.

Default value:

N/A

background_config

No

String

Details:

Background file URL.

Constraints:

  • External URLs are allowed only for livestreaming. For other services, obtain a URL from the asset library.
  • This parameter is mandatory when background_type is set to IMAGE.

Options:

The value contains 1 to 2,048 characters.

Default value:

N/A

background_color_config

No

String

Details:

RGB color value of a solid color background.

Constraints:

This parameter is mandatory when background_type is set to COLOR.

Options:

The value contains 0 to 16 characters.

Default value:

#FFFFFF

background_asset_id

No

String

Details:

Background asset ID.

NOTE:

If a background image is used, enter the image asset ID.

Constraints:

N/A

Options:

The value contains 0 to 64 characters.

Default value:

N/A

Table 14 LayerConfig

Parameter

Mandatory

Type

Description

layer_type

Yes

String

Details:

Layer type.

Constraints:

N/A

Options:

  • HUMAN: person layer
  • IMAGE: image layer
  • VIDEO: video layer
  • TEXT: text layer

Default value:

N/A

asset_id

No

String

Details:

ID of the asset overlaid on a video. You do not need to set this parameter for external assets.

Constraints:

N/A

Options:

The value contains 0 to 64 characters.

Default value:

N/A

group_id

No

String

Details:

Groups materials in multiple scenes. Materials with the same group_id share location information when they are applied globally.

Constraints:

N/A

Options:

The value contains 0 to 64 characters.

Default value:

N/A

position

No

LayerPositionConfig object

Layer position configuration.

size

No

LayerSizeConfig object

Layer size configuration.

image_config

No

ImageLayerConfig object

Material image layer configuration.

video_config

No

VideoLayerConfig object

Material video layer configuration.

text_config

No

TextLayerConfig object

Material text layer configuration.

Table 15 LayerPositionConfig

Parameter

Mandatory

Type

Description

dx

Yes

Integer

Details:

X axis position of the pixel in the upper left corner of the image. The coordinate of the upper left corner of the image layout is 0x0.

The image layout resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16).

Constraints:

The value is the pixel value relative to the image layout. It indicates only the layout position relationship and is irrelevant to the resolution of the output image.

Value range:

-1920-3840

Default value:

0

dy

Yes

Integer

Details:

Y axis position of the pixel in the upper left corner of the image. The coordinate of the upper left corner of the image layout is 0x0.

The image layout resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16).

Constraints:

The value is the pixel value relative to the image layout. It indicates only the layout position relationship and is irrelevant to the resolution of the output image.

Value range:

-1920-3840

Default value:

0

layer_index

Yes

Integer

Details:

Layer sequence of an image, video, or person image.

NOTE:

The layer sequence is an integer starting from 1 and incremented by 1.

Constraints:

If duplicate layers exist, the overlay relationship between the duplicate layers is random.

Value range:

1-100

Default value:

100

Table 16 LayerSizeConfig

Parameter

Mandatory

Type

Description

width

No

Integer

Details:

Y axis position of the pixel in the upper left corner of the image. Width (in pixel) of the layer image (relative to the image layout size).

The image layout resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16).

Constraints:

The value is the pixel value relative to the image layout. It indicates only the layout position relationship and is irrelevant to the resolution of the output image.

Value range:

1-7680

height

No

Integer

Details:

Height (in pixel) of the layer image (relative to the image layout size).

The image layout resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16).

Constraints:

The value is the pixel value relative to the image layout. It indicates only the layout position relationship and is irrelevant to the resolution of the output image. |

Value range:

1-7680

Table 17 ImageLayerConfig

Parameter

Mandatory

Type

Description

image_url

No

String

Details:

Image file URL.

Constraints:

N/A

Options:

The value contains 1 to 2,048 characters.

Default value:

N/A

Table 18 VideoLayerConfig

Parameter

Mandatory

Type

Description

video_url

No

String

Details:

Video file URL.

Constraints:

N/A

Options:

The value contains 1 to 2,048 characters.

Default value:

N/A

video_cover_url

No

String

Details:

Video thumbnail file URL.

Constraints:

N/A

Options:

The value contains 1 to 2,048 characters.

Default value:

N/A

loop_count

No

Integer

Details:

Number of times that a video is played cyclically.

Options:

  • 0: no
  • -1: yes

Constraints:

N/A

Value range:

-1-100

Default value:

-1

Table 19 TextLayerConfig

Parameter

Mandatory

Type

Description

text_context

No

String

Details:

Text of the text layer. The content must be encoded using Base64.

For example, if you want to add the text watermark "Test text watermark", set text_context to 5rWL6K+V5paH5a2X5rC05Y2w.

Constraints:

N/A

Options:

The value contains 0 to 1,024 characters.

Default value:

N/A

font_name

No

String

Details:

Font. The following fonts are supported:

Constraints:

N/A

Options:

For details about the supported fonts, see Supported Fonts.

Default value:

HarmonyOS_Sans_SC_Black

font_size

No

Integer

Details:

Font size (in pixel). The interface value ranges from 0 to 120. The actual value range is 4 to 120. Use the actual value range.

Constraints:

N/A

Value range:

0-120

Default value:

16

font_color

No

String

Details:

Font color. RGB color value.

Constraints:

N/A

Options:

The value contains 0 to 16 characters.

Default value:

#FFFFFF

Table 20 SubtitleFiles

Parameter

Mandatory

Type

Description

text_subtitle_file

No

SubtitleFileInfo object

  

audio_subtitle_file

No

SubtitleFileInfo object

  
Table 21 SubtitleFileInfo

Parameter

Mandatory

Type

Description

subtitle_file_download_url

No

String

URL for downloading subtitle files.

subtitle_file_upload_url

No

String

URL for uploading subtitle files.

subtitle_file_state

No

String

Subtitle file generation status.

  • GENERATING: generating the subtitle file...
  • GENERATE_SUCCEED: subtitle file generated
  • **GENERATE_FAILED: Subtitle file generation failed.

job_id

No

String

Subtitle file generation task ID.

Response Parameters

Status code: 200

Table 22 Response header parameters

Parameter

Type

Description

X-Request-Id

String

Request ID.

Table 23 Response body parameters

Parameter

Type

Description

script_id

String

Script ID.

audio_files

ShootScriptAudioFiles object

URLs for uploading audio files for speech control

Table 24 ShootScriptAudioFiles

Parameter

Type

Description

audio_file_url

Array of ShootScriptAudioFileItem objects

URLs for uploading audio files for speech control

Table 25 ShootScriptAudioFileItem

Parameter

Type

Description

sequence_no

Integer

Script No.

Value range:

0-2147483647

audio_file_upload_url

String

URL for uploading the audio file for speech control. It is returned when a script is created or updated. The maximum size of a single file is 100 MB. MP3/WAV/M4A files can be uploaded.

audio_file_download_url

String

URL for downloading the audio file for speech control. It is returned when script details are queried.

Status code: 400

Table 26 Response body parameters

Parameter

Type

Description

error_code

String

Error code.

error_msg

String

Error description.

Status code: 401

Table 27 Response body parameters

Parameter

Type

Description

error_code

String

Error code.

error_msg

String

Error description.

Status code: 500

Table 28 Response body parameters

Parameter

Type

Description

error_code

String

Error code.

error_msg

String

Error description.

Example Requests

POST https://{endpoint}/v1/70b76xxxxxx34253880af501cdxxxxxx/digital-human-video-scripts

{
  "script_name" : "The Legend of Nature",
  "script_description" : "Courseware",
  "model_asset_id" : "a5d295cdb345c11bd9f36bc22ced3a7a",
  "scene_asset_id" : "7ad01cf66f6cc54e45a2021558b7fbb0",
  "voice_config" : {
    "voice_asset_id" : "a5d295cdb345c11bd9f36bc22ced3a7a"
  },
  "video_config" : {
    "codec" : "H264",
    "bitrate" : 4000,
    "frame_rate" : "25",
    "width" : 1920,
    "height" : 1080
  },
  "shoot_scripts" : [ {
    "sequence_no" : 0,
    "shoot_script" : {
      "text_config" : {
        "text" : "Hello, everyone. I'm Sara."
      },
      "animation_config" : [ {
        "animation" : "7affc1c9d10b9779957fce7d4aecbd35"
      } ],
      "background_config" : [ {
        "background_type" : "IMAGE",
        "background_config" : "978f893e1de4553c183b7a805e6290f5"
      } ]
    }
  } ]
}

Example Responses

Status code: 200

Succeeded.

{
  "script_id" : "26f06524-4f75-4b3a-a853-b649a21aaf66"
}

Status code: 400

{
  "error_code" : "MSS.00000003",
  "error_msg" : "Invalid parameter"
}

Status code: 401

{
  "error_code" : "MSS.00000001",
  "error_msg" : "Unauthorized"
}

Status code: 500

{
  "error_code" : "MSS.00000004",
  "error_msg" : "Internal Error"
}

Status Codes

Status Code

Description

200

Succeeded.

400

Parameters error, including the error code and its description.

401

Authentication is not performed or fails.

500

Internal service error.

Error Codes

See Error Codes.