Functions
Real-Time ASR
Real-Time ASR allows you to obtain real-time speech recognition results by accessing and invoking the API. Currently, Real-Time ASR supports Mandarin Chinese.
- Text Timestamps
Generates specific timestamps for the audio conversion result, so that you can quickly find the spot in the original audio clip to confirm the text and adopt if needed.
- Intelligent Text Segmentation
By extracting semantic features of the context and combining voice features, intelligently segments sentences and adds punctuation marks to improve the readability of the output text.
- Hybrid Recognition
Supports recognition of English letters/words and digits included in Chinese sentences.
- Instant Result Output
Continuously recognizes voice streams, outputs results in real time, and automatically corrects the content based on the context language model.
- Automatic VAD
Performs voice activity detection (VAD) on the input voice streams to improve recognition efficiency and accuracy.
Highlights
- High Recognition Accuracy
Adopts the latest generation of speech recognition and Deep Neural Network (DNN) technologies to greatly improve the anti-noise performance and recognition accuracy.
- High Speed
Integrates the language models, dictionaries, and acoustic models into a large neural network featuring impressive optimizations in the engineering to greatly increase the decoding speed, achieving faster recognition.
- Multiple Recognition Modes
Supports multiple real-time speech recognition modes, including streaming, continuous, and single-sentence, to suit different application scenarios.
- Customization Service
Allows you to customize the language-layer model in a specific vertical domain to better recognize proprietary words and industry terms, adding a significant boost to accuracy.
Short Sentence Recognition
Short Sentence Recognition converts audio recordings within 30s to text. Specifically, the system processes the binary audio data uploaded by users and generates the corresponding text. The supported language includes English.
Highlights
- High Recognition Rate
Utilizes the deep learning technology to optimize speech recognition for domain-specific scenarios, enabling an industry-leading recognition rate.
- Cutting-Edge Technologies
Combines mature speech recognition algorithms currently in active use in the industry with the latest research to empower enterprises with unique competitive advantages.
- Customizable Models
Increases accuracy by using speech recognition models designed for the specific requirements of the vertical industry you operate in for other specific scenarios.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot