Help Center/ MetaStudio/ API Reference/ Appendix/ HTTP Commands for Controlling Livestreaming

Updated on 2024-03-21 GMT+08:00

View PDF

HTTP Commands for Controlling Livestreaming

Virtual avatar livestreaming can be taken over by humans through the HTTP API. After the takeover by humans, virtual avatars can speak through:

Text control: The corresponding command is INSERT_PLAY_SCRIPT, and the params structure is PlayTextInfo.
Audio control: The corresponding command is INSERT_PLAY_AUDIO, and the params structure is PlayTextInfo.

When the command of the API for controlling virtual human livestreaming is GET_CURRENT_PLAYING_SCRIPTS, the response structure is LivePlayingScriptList.

PlayTextInfo

**Table 1** PlayTextInfo
Parameter	Mandatory (Yes/No)	Type	Description
text_config	No	Table 2 object	Script configuration.
play_type	No	String	Playback mode. The options are as follows: APPEND: adding a playback request at the end of the playback queue INSERT: inserting a playback request between two audio files or at the end of the text that is being played PLAY_NOW: inserting a playback request immediately after the instruction is received. That is, the playback can be started before the current text playback is completed. Default value: PLAY_NOW The value contains 0 to 32 characters.
play_role	No	String	Playback role. The options are as follows: STREAMER: streamer CO_STREAMER: assistant Default value: STREAMER The value contains 0 to 32 characters.
rule_index	No	String	When the interaction callback triggers the insertion of an audio reply, the index of the triggered interaction rule is carried. The value contains 0 to 64 characters.

**Table 2** TextConfig
Parameter	Mandatory (Yes/No)	Type	Description
text	Yes	String	Script. The value contains 1 to 131,072 characters. The following modes are supported: Text-only mode Example: Hello, everyone. I'm Sara, a virtual streamer. Tag mode The tag mode uses Speech Synthesis Markup Language (SSML). Tags to be used are as follows: <speak>: This tag is the root node of all text. All text that needs to call SSML tags must be contained in the <speak> </speak> tag pair. <emotion>: emotion tag, which takes effect for one or more specified sentences. The tag starts at the beginning of a sentence and ends at the end of the sentence. The format is <emotion type="emotion tag">. The value of type can be HAPPY, SAD, CALM, or ANGER. <insert-action>: action tag. You can insert an action at a specified position of the text. The format is <insert-action id="Action asset ID" name="Action name" tag="Action ID"/>. The action asset information is obtained using the asset library API. <break>: pause tag. You can insert a pause at a specified position of the text. The format is <break time="Pause duration"/>, in milliseconds. The minimum value is 200 ms. <phoneme>: The multi-pronunciation word tag can specify the pronunciation of one Chinese character. Only one Chinese character is allowed between the start and end of the tag. The property can be set to Chinese Pinyin, and the tone is represented by 1, 2, 3, or 4. The format is \<phoneme ph="pinyin"/>character\</phoneme>. NOTE: Example: <speak> <emotion type="HAPPY"><insert-action id="2692ea5d095caaafcfed21dc4590b701" name="fingertips of both hands touched" tag="system_female_animation_0026"/>Hello.<break time="200ms"/>I'm a MetaStudio AI virtual human. </emotion>I'll show you<phoneme ph="liao3">how</phoneme>MetaStudio works. </speak> Only the <break> and <phoneme> tags take effect for virtual avatar video production.

PlayAudioInfo

**Table 3** PlayAudioInfo
Parameter	Mandatory (Yes/No)	Type	Description
audio_url	No	String	Audio URL. The value contains 0 to 2,048 characters.
play_type	No	String	Playback mode. The options are as follows: APPEND: adding a playback request at the end of the playback queue INSERT: inserting a playback request between two audio files or at the end of the text that is being played PLAY_NOW: inserting a playback request immediately after the instruction is received. That is, the playback can be started before the current text playback is completed. Default value: APPEND The value contains 0 to 32 characters.
play_role	No	String	Playback role. The options are as follows: STREAMER: streamer CO_STREAMER: assistant Default value: STREAMER The value contains 0 to 32 characters.
rule_index	No	String	When the interaction callback triggers the insertion of an audio reply, the index of the triggered interaction rule is carried. The value contains 0 to 64 characters.

LivePlayingScriptList

**Table 4** LivePlayingScriptList
Parameter	Mandatory (Yes/No)	Type	Description
scene_scripts	No	Array of Table 5	Scripts. Array length: 1–100

**Table 5** LivePlayingScriptInfo
Parameter	Mandatory (Yes/No)	Type	Description
script_name	No	String	Script name. The value contains 1 to 256 characters.
model_asset_id	No	String	Virtual human model asset ID. The value contains 0 to 64 characters.
shoot_scripts	No	Array of Table 6	Video shooting scripts. Array length: 0–100