Help Center/ MetaStudio/ API Reference/ Appendix/ HTTP Commands for Controlling Livestreaming
Updated on 2024-03-21 GMT+08:00

HTTP Commands for Controlling Livestreaming

Virtual avatar livestreaming can be taken over by humans through the HTTP API. After the takeover by humans, virtual avatars can speak through:

  • Text control: The corresponding command is INSERT_PLAY_SCRIPT, and the params structure is PlayTextInfo.
  • Audio control: The corresponding command is INSERT_PLAY_AUDIO, and the params structure is PlayTextInfo.

When the command of the API for controlling virtual human livestreaming is GET_CURRENT_PLAYING_SCRIPTS, the response structure is LivePlayingScriptList.

PlayTextInfo

Table 1 PlayTextInfo

Parameter

Mandatory (Yes/No)

Type

Description

text_config

No

Table 2 object

Script configuration.

play_type

No

String

Playback mode.

The options are as follows:

  • APPEND: adding a playback request at the end of the playback queue
  • INSERT: inserting a playback request between two audio files or at the end of the text that is being played
  • PLAY_NOW: inserting a playback request immediately after the instruction is received. That is, the playback can be started before the current text playback is completed.

Default value: PLAY_NOW

The value contains 0 to 32 characters.

play_role

No

String

Playback role.

The options are as follows:

  • STREAMER: streamer
  • CO_STREAMER: assistant

Default value: STREAMER

The value contains 0 to 32 characters.

rule_index

No

String

When the interaction callback triggers the insertion of an audio reply, the index of the triggered interaction rule is carried.

The value contains 0 to 64 characters.

Table 2 TextConfig

Parameter

Mandatory (Yes/No)

Type

Description

text

Yes

String

Script. The value contains 1 to 131,072 characters.

The following modes are supported:

  • Text-only mode

    Example: Hello, everyone. I'm Sara, a virtual streamer.

  • Tag mode

    The tag mode uses Speech Synthesis Markup Language (SSML).

    Tags to be used are as follows:

    • <speak>: This tag is the root node of all text. All text that needs to call SSML tags must be contained in the <speak> </speak> tag pair.
    • <emotion>: emotion tag, which takes effect for one or more specified sentences. The tag starts at the beginning of a sentence and ends at the end of the sentence. The format is <emotion type="emotion tag">. The value of type can be HAPPY, SAD, CALM, or ANGER.
    • <insert-action>: action tag. You can insert an action at a specified position of the text. The format is <insert-action id="Action asset ID" name="Action name" tag="Action ID"/>. The action asset information is obtained using the asset library API.
    • <break>: pause tag. You can insert a pause at a specified position of the text. The format is <break time="Pause duration"/>, in milliseconds. The minimum value is 200 ms.
    • <phoneme>: The multi-pronunciation word tag can specify the pronunciation of one Chinese character. Only one Chinese character is allowed between the start and end of the tag. The property can be set to Chinese Pinyin, and the tone is represented by 1, 2, 3, or 4. The format is \<phoneme ph="pinyin"/>character\</phoneme>.
NOTE:
  • Example: <speak> <emotion type="HAPPY"><insert-action id="2692ea5d095caaafcfed21dc4590b701" name="fingertips of both hands touched" tag="system_female_animation_0026"/>Hello.<break time="200ms"/>I'm a MetaStudio AI virtual human. </emotion>I'll show you<phoneme ph="liao3">how</phoneme>MetaStudio works. </speak>
  • Only the <break> and <phoneme> tags take effect for virtual avatar video production.

PlayAudioInfo

Table 3 PlayAudioInfo

Parameter

Mandatory (Yes/No)

Type

Description

audio_url

No

String

Audio URL.

The value contains 0 to 2,048 characters.

play_type

No

String

Playback mode.

The options are as follows:

  • APPEND: adding a playback request at the end of the playback queue
  • INSERT: inserting a playback request between two audio files or at the end of the text that is being played
  • PLAY_NOW: inserting a playback request immediately after the instruction is received. That is, the playback can be started before the current text playback is completed.

Default value: APPEND

The value contains 0 to 32 characters.

play_role

No

String

Playback role.

The options are as follows:

  • STREAMER: streamer
  • CO_STREAMER: assistant

Default value: STREAMER

The value contains 0 to 32 characters.

rule_index

No

String

When the interaction callback triggers the insertion of an audio reply, the index of the triggered interaction rule is carried.

The value contains 0 to 64 characters.

LivePlayingScriptList

Table 4 LivePlayingScriptList

Parameter

Mandatory (Yes/No)

Type

Description

scene_scripts

No

Array of Table 5

Scripts.

Array length: 1–100

Table 5 LivePlayingScriptInfo

Parameter

Mandatory (Yes/No)

Type

Description

script_name

No

String

Script name.

The value contains 1 to 256 characters.

model_asset_id

No

String

Virtual human model asset ID.

The value contains 0 to 64 characters.

shoot_scripts

No

Array of Table 6

Video shooting scripts.

Array length: 0–100

Table 6 LivePlayingShootScriptItem

Parameter

Mandatory (Yes/No)

Type

Description

sequence_no

No

Integer

Script No.

The value ranges from 0 to 2,147,483,647.

title

No

String

Paragraph title.

The value contains 0 to 256 characters.