Updated on 2025-09-18 GMT+08:00

Functions

Real-Time ASR

Real-Time ASR allows you to obtain real-time speech recognition results by accessing and invoking the API. Currently, Real-Time ASR supports Mandarin Chinese.

  • Text Timestamps

    Generates specific timestamps for the audio conversion result, so that you can quickly find the spot in the original audio clip to confirm the text and adopt if needed.

  • Intelligent Text Segmentation

    By extracting semantic features of the context and combining voice features, intelligently segments sentences and adds punctuation marks to improve the readability of the output text.

  • Hybrid Recognition

    Supports recognition of English letters/words and digits included in Chinese sentences.

  • Instant Result Output

    Continuously recognizes voice streams, outputs results in real time, and automatically corrects the content based on the context language model.

  • Automatic VAD

    Performs voice activity detection (VAD) on the input voice streams to improve recognition efficiency and accuracy.

Highlights

  • High Recognition Accuracy

    Adopts the latest generation of speech recognition and Deep Neural Network (DNN) technologies to greatly improve the anti-noise performance and recognition accuracy.

  • High Speed

    Integrates the language models, dictionaries, and acoustic models into a large neural network featuring impressive optimizations in the engineering to greatly increase the decoding speed, achieving faster recognition.

  • Multiple Recognition Modes

    Supports multiple real-time speech recognition modes, including streaming, continuous, and single-sentence, to suit different application scenarios.

  • Customization Service

    Allows you to customize the language-layer model in a specific vertical domain to better recognize proprietary words and industry terms, adding a significant boost to accuracy.

Short Sentence Recognition

Short Sentence Recognition converts audio recordings within 30s to text. Specifically, the system processes the binary audio data uploaded by users and generates the corresponding text. The supported language includes English.

Highlights

  • High Recognition Rate

    Utilizes the deep learning technology to optimize speech recognition for domain-specific scenarios, enabling an industry-leading recognition rate.

  • Cutting-Edge Technologies

    Combines mature speech recognition algorithms currently in active use in the industry with the latest research to empower enterprises with unique competitive advantages.

  • Customizable Models

    Increases accuracy by using speech recognition models designed for the specific requirements of the vertical industry you operate in for other specific scenarios.