Updated on 2025-09-12 GMT+08:00

Timestamp Data

Function

While generating audio streams, real-time TTS can also generate timestamp information for each Chinese character/English word. This information can be used for video subtitles and driving virtual human lip-sync.

Request Parameters

Set subtitle to word_level or phoneme_level to enable the timestamp function.

Response Parameters

Table 1 Response parameters

Parameter

Type

Description

resp_type

String

Response type. The value is RESULT.

trace_id

String

Internal token of the service, which can be used to trace a specific process in logs.

result

List

Timestamp information.

Table 2 result data structure

Parameter

Type

Description

start_time

Integer

Start timestamp of the synthesized audio corresponding to the text, in milliseconds.

end_time

Integer

End timestamp of the synthesized audio corresponding to the text, in milliseconds.

text

String

Text information.

word_index

Integer

Position of the text in the entire sentence, starting from 0.

phonemes

List

Phoneme timestamp information, which is returned when subtitle is set to phoneme_level.

Table 3 phonemes data structure

Parameter

Type

Description

phoneme

String

Phoneme text information.

start_time

Integer

Start timestamp of the synthesized audio corresponding to the phoneme, in milliseconds.

end_time

Integer

End timestamp of the synthesized audio corresponding to the phoneme, in milliseconds.

phoneme_index

Integer

Phoneme position, starting from 0.

Example

word_level

{
 'resp_type': 'RESULT', 
 'trace_id':  'd34e3ccb-0383-4c76-a107-ec6ced44614f', 
 'result': 
        [
            {'start_time': 43980, 'end_time': 44210, 'word_index': 10, 'text': 'there'},
            {'start_time': 44210, 'end_time': 45298, 'word_index': 11, 'text': 'by'}
        ]
}

{
 'resp_type': 'RESULT', 
 'trace_id':  'd34e3ccb-0383-4c76-a107-ec6ced44614f', 
 'result': 
        [
            {'start_time': 0, 'end_time': 384, 'text': 'Nice', 'word_index': 0},
            {'start_time': 384, 'end_time': 512, 'text': 'to', 'word_index': 1},
            {'start_time': 512, 'end_time': 800, 'text': 'meet', 'word_index': 2},
            {'start_time': 800, 'end_time': 1184, 'text': 'you.', 'word_index': 3},
            {'start_time': 1184, 'end_time': 1284, 'text': '', 'word_index': 4}
        ]
}

phoneme_level

{
    'resp_type': 'RESULT', 
    'trace_id': '39f02607-32d8-4c9f-8b20-11d4af28eecc', 
    'result': 
    [
        {
            'start_time': 0, 
            'end_time': 384, 
            'text': 'Nice', 
            'word_index': 0, 
            'phonemes': [
                {'phoneme_index': 0, 'start_time': 0, 'end_time': 181, 'phoneme': 'n'},
                {'phoneme_index': 1, 'start_time': 181, 'end_time': 288, 'phoneme': 'ay'},
                {'phoneme_index': 2, 'start_time': 288, 'end_time': 384, 'phoneme': 's'}
            ]
        },
        {
            'start_time': 384, 
            'end_time': 512, 
            'text': 'to', 
            'word_index': 1, 
            'phonemes': [
                {'phoneme_index': 0, 'start_time': 384, 'end_time': 426, 'phoneme': 't'},
                {'phoneme_index': 1, 'start_time': 426, 'end_time': 512, 'phoneme': 'ah0'}
            ]
        },
        {
            'start_time': 512, 
            'end_time': 800, 
            'text': 'meet', 
            'word_index': 2, 
            'phonemes': [
                {'phoneme_index': 0, 'start_time': 512, 'end_time': 608, 'phoneme': 'm'},
                {'phoneme_index': 1, 'start_time': 608, 'end_time': 693, 'phoneme': 'iy'},
                {'phoneme_index': 2, 'start_time': 693, 'end_time': 800, 'phoneme': 't'}
            ]
        },
        {
            'start_time': 800, 
            'end_time': 1184, 
            'text': 'you.', 
            'word_index': 3, 
            'phonemes': [
                {'phoneme_index': 0, 'start_time': 800, 'end_time': 864, 'phoneme': 'y'},
                {'phoneme_index': 1, 'start_time': 864, 'end_time': 1013, 'phoneme': 'uw'},
                {'phoneme_index': 2, 'start_time': 1013, 'end_time': 1184, 'phoneme': ''}
            ]
        },
        {
            'start_time': 1184, 
            'end_time': 1284, 
            'text': '', 
            'word_index': 4, 
            'phonemes': [
                {'phoneme_index': 0, 'start_time': 1184, 'end_time': 1284, 'phoneme': ''}
            ]
        }
    ]
}