Help Center/ MetaStudio/ API Reference/ Appendix/ HTTP Commands for Controlling Livestreaming
Updated on 2025-04-22 GMT+08:00

HTTP Commands for Controlling Livestreaming

Virtual avatar livestreaming can be taken over by humans using HTTP APIs. After the takeover by humans, the virtual avatar can speak as instructed by either of the following methods:

  • By input text. Run INSERT_PLAY_SCRIPT to insert a script as the answer to be given by the virtual avatar. When the virtual avatar remains unchanged, so do the background. Its parameter structure is defined as PlayTextInfo.
  • By input audio. Run INSERT_PLAY_AUDIO to insert an audio as the answer to be given by the virtual avatar. When the virtual avatar remains unchanged, so do the background. Its parameter structure is defined as PlayAudioInfo.

Commands for the API for controlling virtual avatar intelligent livestreaming:

  • When the command is GET_CURRENT_PLAYING_SCRIPTS, the scripts of the current round are queried. Its response structure is defined as LivePlayingScriptList.
  • When the command is CLEAN_UP_INSERT_COMMAND, unplayed insertion commands will be cleared. Its parameter structure is defined as CleanUpInsertCommand.

PlayTextInfo

Table 1 PlayTextInfo

Parameter

Mandatory

Type

Description

text_config

No

Table 2 object

Script configuration.

play_type

No

String

Playback mode.

Options:

  • APPEND: adding a playback request at the end of the playback queue
  • INSERT: inserting a playback request between two audio files or at the end of the text that is being played
  • PLAY_NOW: inserting a playback request immediately after the instruction is received. That is, the playback can be started before the current text playback is completed.

Default value: PLAY_NOW

The value contains 0 to 32 characters.

play_role

No

String

Playback role.

Options:

  • STREAMER: streamer
  • CO_STREAMER: co-streamer

Default value: STREAMER

The value contains 0 to 32 characters.

rule_index

No

String

When the interaction callback triggers the insertion of an audio reply, the index of the triggered interaction rule is carried.

The value contains 0 to 64 characters.

Table 2 TextConfig

Parameter

Mandatory

Type

Description

text

Yes

String

Script. The value contains 1 to 131,072 characters.

The following modes are supported:

  • Text-only mode

    Example: Hello, everyone. I'm Sara, a virtual streamer.

  • Tag mode

    The tag mode uses Speech Synthesis Markup Language (SSML).

    Tags to be used are as follows:

    • <speak>: This tag is the root node of all text. All text that needs to call SSML tags must be contained in the <speak> </speak> tag pair.
    • <emotion>: emotion tag, which takes effect for one or more specified sentences. The tag starts at the beginning of a sentence and ends at the end of the sentence. The format is <emotion type="emotion tag">. The value of type can be HAPPY, SAD, CALM, or ANGER.
    • <insert-action>: action tag. You can insert an action at a specified position of the text. The format is <insert-action id="Action asset ID" name="Action name" tag="Action ID"/>. The action asset information is obtained using the asset library API.
    • <break>: pause tag. You can insert a pause at a specified position of the text. The format is <break time="Pause duration"/>, in milliseconds. The minimum value is 200 ms.
    • <phoneme>: The multi-pronunciation word tag can specify the pronunciation of one Chinese character. Only one Chinese character is allowed between the start and end of the tag. The property can be set to Chinese Pinyin, and the tone is represented by 1, 2, 3, or 4. The format is \<phoneme ph="pinyin"/>character\</phoneme>.
NOTE:
  • Example: <speak> <emotion type="HAPPY"><insert-action id="2692ea5d095caaafcfed21dc4590b701" name="fingertips of both hands touched" tag="system_female_animation_0026"/>Hello.<break time="200ms"/>I'm a MetaStudio AI virtual avatar. </emotion>I'll show you<phoneme ph="liao3">how</phoneme>MetaStudio works. </speak>
  • Only the <break> and <phoneme> tags take effect for virtual avatar video production.

PlayAudioInfo

Table 3 PlayAudioInfo

Parameter

Mandatory

Type

Description

audio_url

No

String

Audio URL.

The value contains 0 to 2,048 characters.

play_type

No

String

Playback mode.

Options:

  • APPEND: adding a playback request at the end of the playback queue
  • INSERT: inserting a playback request between two audio files or at the end of the text that is being played
  • PLAY_NOW: inserting a playback request immediately after the instruction is received. That is, the playback can be started before the current text playback is completed.

Default value: APPEND

The value contains 0 to 32 characters.

play_role

No

String

Playback role.

Options:

  • STREAMER: streamer
  • CO_STREAMER: co-streamer

Default value: STREAMER

The value contains 0 to 32 characters.

rule_index

No

String

When the interaction callback triggers the insertion of an audio reply, the index of the triggered interaction rule is carried.

The value contains 0 to 64 characters.

LivePlayingScriptList

Table 4 LivePlayingScriptList

Parameter

Mandatory

Type

Description

scene_scripts

No

Array of Table 5

Scripts.

Array length: 1–100

Table 5 LivePlayingScriptInfo

Parameter

Mandatory

Type

Description

script_name

No

String

Script name.

The value contains 1 to 256 characters.

model_asset_id

No

String

Virtual avatar model asset ID.

The value contains 0 to 64 characters.

shoot_scripts

No

Array of Table 6

Video shooting scripts.

Array length: 0–100

Table 6 LivePlayingShootScriptItem

Parameter

Mandatory

Type

Description

sequence_no

No

Integer

Script No.

The value ranges from 0 to 2,147,483,647.

title

No

String

Paragraph title.

The value contains 0 to 256 characters.

CleanUpInsertCommand

Table 7 CleanUpInsertCommand

Parameter

Mandatory

Type

Description

command_ids

No

Array of strings

Command IDs. If this parameter is left blank, all unplayed insertion commands will be cleared.

The value contains 1 to 64 characters.

Array length: 0–100.