Timestamp Data
Function
While generating audio streams, real-time TTS can also generate timestamp information for each Chinese character/English word. This information can be used for video subtitles and driving virtual human lip-sync.
Request Parameters
Set subtitle to word_level or phoneme_level to enable the timestamp function.
Response Parameters
Parameter |
Type |
Description |
---|---|---|
resp_type |
String |
Response type. The value is RESULT. |
trace_id |
String |
Internal token of the service, which can be used to trace a specific process in logs. |
result |
List |
Timestamp information. |
Parameter |
Type |
Description |
---|---|---|
start_time |
Integer |
Start timestamp of the synthesized audio corresponding to the text, in milliseconds. |
end_time |
Integer |
End timestamp of the synthesized audio corresponding to the text, in milliseconds. |
text |
String |
Text information. |
word_index |
Integer |
Position of the text in the entire sentence, starting from 0. |
phonemes |
List |
Phoneme timestamp information, which is returned when subtitle is set to phoneme_level. |
Parameter |
Type |
Description |
---|---|---|
phoneme |
String |
Phoneme text information. |
start_time |
Integer |
Start timestamp of the synthesized audio corresponding to the phoneme, in milliseconds. |
end_time |
Integer |
End timestamp of the synthesized audio corresponding to the phoneme, in milliseconds. |
phoneme_index |
Integer |
Phoneme position, starting from 0. |
Example
word_level
{ 'resp_type': 'RESULT', 'trace_id': 'd34e3ccb-0383-4c76-a107-ec6ced44614f', 'result': [ {'start_time': 43980, 'end_time': 44210, 'word_index': 10, 'text': 'there'}, {'start_time': 44210, 'end_time': 45298, 'word_index': 11, 'text': 'by'} ] }
{ 'resp_type': 'RESULT', 'trace_id': 'd34e3ccb-0383-4c76-a107-ec6ced44614f', 'result': [ {'start_time': 0, 'end_time': 384, 'text': 'Nice', 'word_index': 0}, {'start_time': 384, 'end_time': 512, 'text': 'to', 'word_index': 1}, {'start_time': 512, 'end_time': 800, 'text': 'meet', 'word_index': 2}, {'start_time': 800, 'end_time': 1184, 'text': 'you.', 'word_index': 3}, {'start_time': 1184, 'end_time': 1284, 'text': '', 'word_index': 4} ] }
phoneme_level
{ 'resp_type': 'RESULT', 'trace_id': '39f02607-32d8-4c9f-8b20-11d4af28eecc', 'result': [ { 'start_time': 0, 'end_time': 384, 'text': 'Nice', 'word_index': 0, 'phonemes': [ {'phoneme_index': 0, 'start_time': 0, 'end_time': 181, 'phoneme': 'n'}, {'phoneme_index': 1, 'start_time': 181, 'end_time': 288, 'phoneme': 'ay'}, {'phoneme_index': 2, 'start_time': 288, 'end_time': 384, 'phoneme': 's'} ] }, { 'start_time': 384, 'end_time': 512, 'text': 'to', 'word_index': 1, 'phonemes': [ {'phoneme_index': 0, 'start_time': 384, 'end_time': 426, 'phoneme': 't'}, {'phoneme_index': 1, 'start_time': 426, 'end_time': 512, 'phoneme': 'ah0'} ] }, { 'start_time': 512, 'end_time': 800, 'text': 'meet', 'word_index': 2, 'phonemes': [ {'phoneme_index': 0, 'start_time': 512, 'end_time': 608, 'phoneme': 'm'}, {'phoneme_index': 1, 'start_time': 608, 'end_time': 693, 'phoneme': 'iy'}, {'phoneme_index': 2, 'start_time': 693, 'end_time': 800, 'phoneme': 't'} ] }, { 'start_time': 800, 'end_time': 1184, 'text': 'you.', 'word_index': 3, 'phonemes': [ {'phoneme_index': 0, 'start_time': 800, 'end_time': 864, 'phoneme': 'y'}, {'phoneme_index': 1, 'start_time': 864, 'end_time': 1013, 'phoneme': 'uw'}, {'phoneme_index': 2, 'start_time': 1013, 'end_time': 1184, 'phoneme': ''} ] }, { 'start_time': 1184, 'end_time': 1284, 'text': '', 'word_index': 4, 'phonemes': [ {'phoneme_index': 0, 'start_time': 1184, 'end_time': 1284, 'phoneme': ''} ] } ] }
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot