Creating a Virtual Avatar Video Production Task
Function
Creates a virtual avatar video production task.
Calling Method
For details, see Calling APIs.
Authorization Information
Each account has all the permissions required to call all APIs, but IAM users must be assigned the required permissions. For details about the required permissions, see Permissions Policies and Supported Actions.
URI
POST /v1/{project_id}/2d-digital-human-videos
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
project_id |
Yes |
String |
Project ID. For details about how to obtain the project ID, see Obtaining a Project ID. |
Request Parameters
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
X-Auth-Token |
No |
String |
User token. This parameter is mandatory when token authentication is used. You can obtain the token by calling the IAM API used to obtain a user token. Value of X-Subject-Token in the response header. |
|
Authorization |
No |
String |
Authentication information. This parameter is mandatory for AK/SK authentication. |
|
X-Sdk-Date |
No |
String |
Time when the request is sent. This parameter is mandatory for AK/SK authentication. The format is YYYYMMDD'T'HHMMSS'Z'. |
|
X-Project-Id |
No |
String |
Project ID. This parameter is mandatory for AK/SK authentication. |
|
X-App-UserId |
No |
String |
Third-party user ID, which does not allow Chinese characters. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
script_id |
No |
String |
Script ID.
NOTE:
|
|
model_asset_id |
No |
String |
Virtual avatar model asset ID, which can be queried from the asset library. |
|
voice_config |
No |
VoiceConfig object |
Timbre configuration. |
|
video_config |
No |
VideoConfig object |
Video output configuration. |
|
shoot_scripts |
No |
Array of ShootScriptItem objects |
Video shooting scripts. |
|
output_asset_config |
No |
OutputAssetConfig object |
Output asset information configuration. |
|
background_music_config |
No |
BackgroundMusicConfig object |
Background music configuration. |
|
review_config |
No |
ReviewConfig object |
Configures content review. |
|
callback_config |
No |
CallBackConfig object |
Callback settings. |
|
action_config |
No |
ActionConfig object |
Choreography configuration.
NOTE:
|
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
voice_asset_id |
Yes |
String |
Definition Timbre asset ID, which can be queried from the asset library. For details about how to query voice IDs, see Querying Preset Voice IDs. Constraints N/A Range The value can contain 1 to 256 characters. Default Value N/A |
|
speed |
No |
Integer |
Definition Speaking speed. 50 indicates 0.5x speaking speed, 100 indicates normal speaking speed, and 200 indicates 2x speaking speed. The value 100 indicates the normal speaking speed of an adult, which is about 250 words per minute. Constraints N/A Value range: 50~200 Default value: 100 |
|
pitch |
No |
Integer |
Definition Pitch. Constraints N/A Value range: 50~200 Default value: 100 |
|
volume |
No |
Integer |
Definition Volume. Constraints N/A Value range: 90~240 Default value: 140 |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
clip_mode |
No |
String |
Definition Clipping mode of the output video. Constraints N/A Range
Default value: RESIZE |
|
codec |
No |
String |
Definition Video encoding format and video file format. Constraints Only virtual avatar video production supports VP8 and QTRLE encoding. When QTRLE encoding is used, the number of characters for text-based control is less than 1,500, and the audio length for audio-based control is less than 5 minutes. You can use QTRLE encoding only after being whitelisted. Range
Default Value H264 Default value: H264 |
|
bitrate |
Yes |
Integer |
Definition Average output bitrate. Unit: kbit/s Constraints
Default Value N/A Value range: 40~30000 |
|
width |
Yes |
Integer |
Definition Video width. Unit: pixel Constraints
Default Value N/A Value range: 0~3840 |
|
height |
Yes |
Integer |
Definition Video height. Unit: pixel Constraints
Default Value N/A Value range: 0~3840 |
|
frame_rate |
No |
String |
Definition Frame rate. Unit: FPS Constraints The virtual avatar video frame rate is fixed at 25 FPS. Default value: 25 |
|
is_subtitle_enable |
No |
Boolean |
Definition Whether the output video is subtitled. Constraints Subtitles are not supported for virtual avatar livestreaming. Range
Default value: false |
|
subtitle_config |
No |
SubtitleConfig object |
Subtitle configuration. |
|
dx |
No |
Integer |
Definition Horizontal coordinate of the pixel in the upper left corner of the cropped video.
NOTE:
The image layout size is based on the model resolution. For example, for a model with the resolution of 1920 x 1080, the value of dx ranges from 0 to 1920.
Constraints This parameter takes effect when clip_mode is set to CROP. Default Value N/A Value range: -1920~3840 |
|
dy |
No |
Integer |
Definition Vertical coordinate of the pixel in the upper left corner of the cropped video.
NOTE:
The image layout size is based on the model resolution. For example, for a model with the resolution of 1920 x 1080, the value of dy ranges from 0 to 1080.
Constraints This parameter takes effect when clip_mode is set to CROP. Default Value N/A Value range: -1920~3840 |
|
is_enable_super_resolution |
No |
Boolean |
Definition Whether super resolution is enabled for a video. Constraints This parameter is available only for virtual avatar video production. Range
Default value: false |
|
is_end_at_first_frame |
No |
Boolean |
Definition Whether the end frame of a video is the same as the start frame. Set this parameter to true if multiple virtual avatar videos need to be seamlessly merged. Constraints This parameter is supported only for virtual avatar video production. This setting becomes invalid after an action tag is inserted during video production. Range
Default value: false |
|
output_external_url |
No |
String |
External URL to which a video file is uploaded.
NOTE:
|
|
is_vocabulary_config_enable |
No |
Boolean |
Definition Whether to apply the pronunciation configuration of the current tenant. Constraints This parameter is available only for virtual avatar video production. Range
Default value: true |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
dx |
No |
Integer |
Definition Coordinates of the pixel in the lower left corner of the subtitle box. Constraints N/A Default Value N/A Value range: 0~1920 |
|
dy |
No |
Integer |
Definition Coordinates of the pixel in the lower left corner of the subtitle box. Constraints N/A Default Value N/A Value range: 0~1920 |
|
h |
No |
Integer |
Definition Subtitle box height. Constraints The parameter h is used to facilitate the calculation of the coordinates in the upper left corner of the subtitle box. This parameter is not used in the background. Value range: 0~1920 |
|
w |
No |
Integer |
Definition Subtitle box width. Constraints
Value range: 0~1920 |
|
font_name |
No |
String |
Definition Font. For details about the supported fonts, see Supported Fonts. Constraints N/A Range The value can contain 0 to 64 characters. Default value: HarmonyOS_Sans_SC_Black |
|
font_size |
No |
Integer |
Definition Font size. The interface value ranges from 0 to 120. The actual value range is 24 to 120. Use the actual value range. Constraints N/A Value range: 0~120 Default value: 54 |
|
font_color |
No |
String |
Definition RGB color value of the subtitle font. Constraints None. Range The value has a fixed length and contains 0 to 7 characters. Default value: #FFFFFF |
|
stroke_color |
No |
String |
Definition RGB color value of the subtitle font stroke. Constraints None. Range The value has a fixed length and contains 0 to 7 characters. |
|
stroke_thickness |
No |
Float |
Definition Pixel value of the subtitle font stroke. Constraints None. Range 0-50 Value range: 0~50 |
|
opacity |
No |
Float |
Definition Subtitle font opacity. 0 indicates 100/ %transparency and 1 indicates 100/ %opacity. The default value is 1. Constraints None. Range 0-1 Value range: 0~1 Default value: 1 |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
sequence_no |
No |
Integer |
Definition Script No. Constraints The sequence number of a script must be unique. Default Value N/A Value range: 0~2147483647 |
|
shoot_script |
Yes |
ShootScript object |
Performance script. |
|
subtitle_file_info |
No |
SubtitleFiles object |
Subtitle file information. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
script_type |
No |
String |
Definition Script type, that is, the control mode of video production. Constraints N/A Range TEXT: text control, that is, using TTS AUDIO: speech control Default value: TEXT |
|
text_config |
No |
TextConfig object |
Commentary configuration. |
|
audio_duration |
No |
Float |
Duration of an audio for audio-based control, in seconds.
NOTE:
Value range: 0~36000 |
|
audio_drive_action_config |
No |
Array of AudioDriveActionConfig objects |
Action configuration for speech control. |
|
audio_drive_file_external_url |
No |
String |
External URL for downloading the audio file for speech control.
NOTE:
|
|
background_config |
No |
Array of BackgroundConfigInfo objects |
Background configuration. |
|
layer_config |
No |
Array of LayerConfig objects |
Overlay configuration.
NOTE:
*This parameter is mandatory when VP8 encoding is used and the resolution of the virtual avatar model differs from that of the output video.
|
|
audio_config |
No |
AudioInfo object |
Audio file information. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
text |
Yes |
String |
Definition Script. Two modes are supported: plain text mode and tag mode.
Constraints The value can contain a maximum of 10,000 characters, excluding the SSML tag. Range The value contains 0 to 131,072 characters. Default Value N/A |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
action_tag |
Yes |
String |
Action tag. |
|
action_name |
No |
String |
Action name. |
|
action_start_time |
Yes |
Float |
Action start time. Value range: 0~2592000 |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
background_type |
Yes |
String |
Definition Background type. Constraints N/A Range
Default Value N/A |
|
background_config |
No |
String |
Definition Background file URL. Constraints
Range The value contains 1 to 2,048 characters. Default Value N/A |
|
background_color_config |
No |
String |
Definition RGB color value of a solid color background. Constraints This parameter is mandatory when background_type is set to COLOR. Range The value contains 0 to 16 characters. Default value: #FFFFFF |
|
background_asset_id |
No |
String |
Definition Background asset ID.
NOTE:
If a background image is used, enter the image asset ID.
Constraints N/A Range The value can contain 0 to 64 characters. Default Value N/A |
|
background_image_config |
No |
BackgroundImageConfig object |
Background image size and position setting. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
dx |
Yes |
Integer |
Definition X axis position of the pixel in the upper left corner of the background image. The coordinate of the upper left corner of the preview area is 0x0. The video resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16). Constraints The background image must cover the entire preview area. That is, dx ≤ 0, dx + width ≥ 1920 in landscape mode, and dx + width ≥ 1080 in portrait mode. Value range: -5760~0 Default value: 0 |
|
dy |
Yes |
Integer |
Definition Y axis position of the pixel in the upper left corner of the background image. The coordinate of the upper left corner of the preview area is 0x0. The video resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16). Constraints The background image must cover the entire preview area. That is, dy ≤ 0, dy + height ≥ 1080 in landscape mode, and dy + height ≥ 1920 in portrait mode. Value range: -5760~0 Default value: 0 |
|
width |
Yes |
Integer |
Definition Width (in pixels) of the background image (relative to the preview area size). The video resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16). Constraints The background image must cover the entire preview area. That is, width > 1080, dx + width ≥ 1920 in landscape mode, and dx + width ≥ 1080 in portrait mode. Value range: 1~7680 |
|
height |
Yes |
Integer |
Definition Height (in pixels) of the background image (relative to the preview area size). The video resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16). Constraints The background image must cover the entire preview area. height > 1080, dy + height ≥ 1080 in landscape mode, and dy + height ≥ 1920 in portrait mode. Value range: 1~7680 |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
layer_type |
Yes |
String |
Definition Layer type. Constraints N/A Range
Default Value N/A |
|
asset_id |
No |
String |
Definition ID of the asset overlaid on a video. You do not need to set this parameter for external assets. Constraints N/A Range The value can contain 0 to 64 characters. Default Value N/A |
|
group_id |
No |
String |
Definition Groups materials in multiple scenes. Materials with the same group_id share location information when they are applied globally. Constraints N/A Range The value can contain 0 to 64 characters. Default Value N/A |
|
sequence_no |
No |
Integer |
Definition Overlay of the paragraph currently being shown. This field is forward compatible and optional. This parameter is valid only for livestreaming. Constraints The paragraph is subject to sequence_no. Default Value N/A Value range: 0~2147483647 |
|
position |
No |
LayerPositionConfig object |
Layer position configuration. |
|
size |
No |
LayerSizeConfig object |
Layer size configuration. |
|
rotation |
No |
LayerRotationConfig object |
Overlay rotation configuration. |
|
image_config |
No |
ImageLayerConfig object |
Image layer configuration. |
|
video_config |
No |
VideoLayerConfig object |
Video overlay configuration. |
|
text_config |
No |
TextLayerConfig object |
Material text layer configuration. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
dx |
Yes |
Integer |
Definition X axis position of the pixel in the upper left corner of the image. The coordinate of the upper left corner of the image layout is 0x0. The video resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16). Constraints The value is the pixel value relative to the image layout. It indicates only the layout position relationship and is irrelevant to the resolution of the output image. Value range: -1920~3840 Default value: 0 |
|
dy |
Yes |
Integer |
Definition Y axis position of the pixel in the upper left corner of the image. The coordinate of the upper left corner of the image layout is 0x0. The video resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16). Constraints The value is the pixel value relative to the image layout. It indicates only the layout position relationship and is irrelevant to the resolution of the output image. Value range: -1920~3840 Default value: 0 |
|
layer_index |
Yes |
Integer |
Definition Overlay sequence of an image, video, or person image.
NOTE:
The overlay sequence is an integer starting from 1 and incremented by 1.
Constraints If there are duplicate overlays, the relationship between the duplicate overlays is random. Value range: 1~100 Default value: 100 |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
width |
No |
Integer |
Definition Y axis position of the pixel in the upper left corner of the image, that is, width (in pixels) of the image overlay (relative to the preview area size). The video resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16). Constraints The value is the pixel value relative to the image layout. It indicates only the layout position relationship and is irrelevant to the resolution of the output image. Value range: 1~7680 |
|
height |
No |
Integer |
Definition Height (in pixels) of the image overlay (relative to the preview area size). The video resolution is 1920 x 1080 in landscape mode (16:9) and 1080 x 1920 in portrait mode (9:16). Constraints The value is the pixel value relative to the image layout. It indicates only the layout position relationship and is irrelevant to the resolution of the output image. | Value range: 1~7680 |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
angle |
No |
Integer |
Definition Rotation angle. Range 0 to 360 degrees Default Value 0 degrees Constraints The material is rotated around the center point. Video materials cannot be rotated. Value range: 0~360 |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
image_url |
No |
String |
Definition Image file URL. Constraints
Range The value contains 1 to 2,048 characters. Default Value N/A |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
video_url |
No |
String |
Definition Video file URL. Constraints
Range The value contains 1 to 2,048 characters. Default Value N/A |
|
video_cover_url |
No |
String |
Definition Video thumbnail file URL. Constraints
Range The value contains 1 to 2,048 characters. Default Value N/A |
|
loop_count |
No |
Integer |
Definition Number of times that a video is played cyclically. Options:
Constraints N/A Value range: -1~100 Default value: -1 |
|
video_sound |
No |
Integer |
Definition The percentage used to adjust the volume of the video overlay. The value ranges from 0 to 100. The default value 0 indicates the audio is muted. Constraints N/A Value range: 0~100 |
|
is_play_the_entire_video |
No |
Boolean |
Definition Whether to play the entire video. true indicates that the entire video is played. false indicates that the video stops playing when the inserted scene text or audio ends. Options: The default value is false. Constraints N/A |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
text_context |
No |
String |
Definition Text of the text layer. The content must be encoded using Base64. For example, if you want to add the text watermark "Test text watermark", set text_context to 5rWL6K+V5paH5a2X5rC05Y2w. Constraints N/A Range The value contains 0 to 1,024 characters. Default Value N/A |
|
font_name |
No |
String |
Font. For details about the supported fonts, see Supported Fonts. Constraints N/A Range The value can contain 0 to 64 characters. Default value: HarmonyOS_Sans_SC_Black |
|
font_size |
No |
Integer |
Definition Font size (in pixels). The interface value ranges from 0 to 120. The actual value range is 4 to 120. Use the actual value range. Constraints N/A Value range: 0~120 Default value: 16 |
|
font_color |
No |
String |
Definition Font color. RGB color value. Constraints N/A Range The value contains 0 to 16 characters. Default value: #FFFFFF |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
audio_id |
No |
Integer |
Definition: Audio ID.
NOTE:
Constraints: N/A Default value: N/A Value range: 0~10000 |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
text_subtitle_file |
No |
SubtitleFileInfo object |
|
|
audio_subtitle_file |
No |
SubtitleFileInfo object |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
subtitle_file_download_url |
No |
String |
URL for downloading subtitle files. |
|
subtitle_file_upload_url |
No |
String |
URL for uploading subtitle files. |
|
subtitle_file_state |
No |
String |
Status of subtitle file generation.
|
|
job_id |
No |
String |
ID of the subtitle file generation task. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
asset_name |
Yes |
String |
Definition: Output video asset name.
NOTE:
Constraints: N/A Value range: The value contains 0 to 256 characters. Default value: N/A |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
music_asset_id |
No |
String |
Definition Music asset ID. Constraints N/A Range The value contains 0 to 64 characters. Default Value N/A |
|
volume |
No |
Integer |
Definition Music volume. For example, 100 indicates that the volume is 100%, and 50 indicates that the volume is 50%. Constraints N/A Value range: 0~100 Default value: 100 |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
no_need_review |
No |
Boolean |
Content review whitelist. This feature is available only for users in the whitelist. The auto review policies apply to other users. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
callback_url |
Yes |
String |
Callback URL. The callback request body is in JSON format and contains the following parameters: result: SUCCEED or FAILED asset_id: asset ID job_id: task |
|
auth_type |
No |
String |
Authentication type.
Default value: NONE |
|
key |
No |
String |
Key. |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
action_interval |
No |
Float |
Interval at which a non-semantic action is automatically inserted. If this parameter is set to 0 or left empty, the interval is 4 seconds by default. If this parameter is set to 255, non-semantic actions are not automatically inserted. Value range: 0~255 Default value: 0 |
Response Parameters
Status code: 200
|
Parameter |
Type |
Description |
|---|---|---|
|
X-Request-Id |
String |
Request ID. |
|
Parameter |
Type |
Description |
|---|---|---|
|
job_id |
String |
Task ID. |
Status code: 400
|
Parameter |
Type |
Description |
|---|---|---|
|
error_code |
String |
Error code. |
|
error_msg |
String |
Error description. |
Status code: 401
|
Parameter |
Type |
Description |
|---|---|---|
|
error_code |
String |
Error code. |
|
error_msg |
String |
Error description. |
Status code: 500
|
Parameter |
Type |
Description |
|---|---|---|
|
error_code |
String |
Error code. |
|
error_msg |
String |
Error description. |
Example Requests
POST https://{endpoint}/v1/0d697589d98091f12f92c0073501cd79/2d-digital-human-videos
{
"model_asset_id" : "0c7798664ee**************57c32e7",
"voice_config" : {
"voice_asset_id" : "394f3a27cd**************b1edf6c",
"speed" : 100,
"pitch" : 100,
"volume" : 140
},
"video_config" : {
"codec" : "H264",
"bitrate" : 5000,
"width" : 1920,
"height" : 1080,
"frame_rate" : "30"
},
"shoot_scripts" : [ {
"sequence_no" : 0,
"shoot_script" : {
"text_config" : {
"text" : "Hello, everyone. I'm Yunling."
},
"background_config" : [ {
"background_type" : "IMAGE",
"background_config" : "https://{endpoint}/0d697589d98091f12f92c0073501cd79/c7885ffdfb347337a890208ca7fd07e3/34534f0262813a6838bdcfb8bc949af6.jpg?AccessKeyId=WTEZCVDFUF3XHXCTPIJ8&Expires=1686872878&Signature=zXGOEQlrgZ4yAUziwlGcdbXLPIM%3D"
} ],
"layer_config" : [ {
"layer_type" : "HUMAN",
"position" : {
"dx" : 656,
"dy" : 0,
"layer_index" : 1
},
"size" : {
"width" : 607,
"height" : 1080
}
} ],
"script_type" : "TEXT"
}
} ],
"output_asset_config" : {
"asset_name" : "Yunling's self-introduction."
}
}
Example Responses
Status code: 200
Returned when the request succeeded.
{
"job_id" : "26f06524-4f75-4b3a-a853-b649a21aaf66"
}
Status code: 400
Parameters error, including the error code and its description.
{
"error_code" : "MSS.00000003",
"error_msg" : "Invalid parameter"
}
Status code: 401
Authentication is not performed or fails.
{
"error_code" : "MSS.00000001",
"error_msg" : "Unauthorized"
}
Status code: 500
Internal service error.
{
"error_code" : "MSS.00000004",
"error_msg" : "Internal Error"
}
Status Codes
|
Status Code |
Description |
|---|---|
|
200 |
Returned when the request succeeded. |
|
400 |
Parameters error, including the error code and its description. |
|
401 |
Authentication is not performed or fails. |
|
500 |
Internal service error. |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot