SSML Definition of Text Control
Text control on MetaStudio uses Speech Synthesis Markup Language (SSML) to control the behaviors of virtual humans, including actions, emotions, and multi-pronunciation words and pauses of TTS voice synthesis.
For the basic definition of SSML, see Speech Synthesis Markup Language (SSML) Version 1.0. On this basis, MetaStudio extends some fields to control virtual humans.
MetaStudio SSML currently supports the following capabilities:
- Text pronunciation control during TTS voice synthesis
The following tags are included:
- <speak></speak> is the root node of the SSML text.
- <break/> is used for mute pause. You can set the pause duration.
- <phoneme></phoneme> is used to mark multi-pronunciation words.
- <say-as></say-as> is used to specify the reading method of digits or English letters.
- <sub></sub> is used to set the alias of the marked text, that is, the alternative reading method.
- <prosody></prosody> is used to control the local speaking speed.
MetaStudio contains multiple TTS timbres. The SSML tag capabilities supported by each timbre are different. You can obtain the tags supported by each timbre by calling the API for querying asset details.
speak
- Description
<speak></speak>: Root node of the SSML text.
- Syntax
1
<speak>Enter the text with an SSML tag. </speak>
- Property
None
- Tag relationship
<speak> can contain text and tags, including <break>, <phoneme>, <say-as>, and <sub>.
break
- Description
break: Inserts a mute pause at any position.
- Syntax
1
<break time="String"/>
- Property
Table 1 Property description Property Name
Property Type
Property Value
Mandatory (Yes/No)
Description
time
String
Value range: 200 ms to 10s
No
Mute pause duration, in milliseconds.
strength
String
The options are as follows:
- none: no rhythm
- x-weak: very short rhythm
- weak: short rhythm
- medium: medium rhythm
- strong: long rhythm
- x-strong: very long rhythm
No
Definition of rhythm
- Tag relationship
Any other tag cannot be contained.
- Example value
1 2
One sentence<break time="200ms"/>another sentence One sentence<break strength="strong"/>another sentence
phoneme
- Description
<phoneme></phoneme>: Pronunciation of a multi-pronunciation Chinese or English word
- Syntax
1 2
<phoneme ph="string">Text </phoneme> The <phoneme ph="W EH1 DH AH0">weather</phoneme> is very good.
- Property
Table 2 Property description Property Name
Property Type
Property Value
Mandatory (Yes/No)
Description
ph
String
Pinyin or phoneme
Yes
- When you enter Chinese Pinyin, the tone is represented by 1, 2, 3, or 4. The value 5 indicates no tone.
- CMU Pronouncing Dictionary
- Tag relationship
Text can be included but any other tag cannot.
- Example value
1
The<phoneme ph="tian1 qi1">weather</phoneme>is good today.
Obtain the Pinyin JS library based on Chinese characters. For details, see pinyin-pro.
say-as
- Description
<say-as></say-as>: Specifies text as a specific type of content, or spells an English word character by character.
- Syntax
1
<say-as interpret-as="string">Digit or word</say-as>
- Property
Table 3 Property description Property Name
Property Type
Property Value
Mandatory (Yes/No)
Description
interpret-as
String
- money: money
- date: date
- figure: value
- phone: phone number
- english: English word
- spell: spelling an English word character by character
Yes
The content is interpreted as a given type of reading method.
- Tag relationship
Text can be included but any other tag cannot.
- Example value
1 2 3 4 5 6
<say-as interpret-as="money">15.55 RMB</say-as> <say-as interpret-as="date">2022/3/8</say-as> <say-as interpret-as="figure">175 cm</say-as> <say-as interpret-as="phone">151 12345678</say-as> <say-as interpret-as="english">Hello</say-as> <say-as interpret-as="spell">Hello</say-as><!-- Read: H E L L O -->
sub
- Description
<sub></sub>: Finds an alternative reading method.
- Syntax
1
<sub alias="string">Text</sub>
- Property
Table 4 Property description Property Name
Property Type
Property Value
Mandatory (Yes/No)
Description
alias
String
Alternative word
Yes
Replace the content of the tag with this value for reading.
- Tag relationship
Text can be included but any other tag cannot.
- Example value
What is actually read is Paul.
1
<sub alias="Paul">Paul</sub>is German.
prosody
- Description
<prosody></prosody>: Controls the local speaking speed.
- Syntax
1
<prosody rate="50">Text </prosody>
- Property
Table 5 Property description Property Name
Property Type
Property Value
Mandatory (Yes/No)
Description
rate
String
Percentage of the speaking speed.
The value ranges from 50 to 200.
Example: 50, indicating that the reading speed is 0.5 times the normal speed.
Yes
Speaking speed
- Tag relationship
Text can be included but any other tag cannot.
- Remarks
1
<prosody rate="50"> Hello, everyone. I'm a MetaStudio virtual human.</prosody>
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot