Updated on 2025-08-26 GMT+08:00

Real-Time ASR Working Process

The working process of Real-Time ASR consists of four phases: starting recognition, sending audio data, ending recognition, and closing the connection.

  • During the phase of starting recognition, a start command needs to be sent, containing configurations such as the sampling rate, audio format, and whether to return the intermediate results. The server returns a start response.
  • During the phase of sending audio data, the client sends audio data in segments. The server returns the recognition results or other events, such as audio timeout and excess long mute duration.
  • After the audio is sent, the client sends a request for ending audio sending, and the server returns an end response.
  • In Real-Time ASR, the client must disconnect from the server proactively. If the server does not receive any data from the client after 20s, it returns an error and disconnects from the client.