Audio Dataset Processing Operators
The platform supports the processing of audio datasets. For details about the audio processing operator capabilities, see Table 1.
Category |
Operator Name |
Operator Description |
---|---|---|
Data conversion |
Adds noise to audios. |
|
Removes pure noise segments from audios and reduces noise. |
||
Adjusts the pitch of the original audio. |
||
Reduces the reverberation effect of the audio in the space and improves the intelligibility of the voice. |
||
Anonymizes the audio. The anonymized audio differs greatly from the the original one in the speaker timbre and voiceprint. |
||
Reduces the noise in the original audio. Only the situation where noise and human voices overlap is considered. No restriction is imposed on pure noise audio or pure noise segments. |
||
Adjusts the speaking speed in the audio. |
||
Converts the original audio based on the specified target style. |
||
Converts a high-resolution audio file with header information into a 16 kHz alaw/ulaw/pcm/wav file using audio encoding and decoding technologies and quantization compression technologies. |
||
Data labeling |
Identifies the language used by the speaker in the audio and provides the confidence. |
|
Converts Mandarin speech into text quickly to enrich human-machine interaction scenarios. |
||
Recognizes the sentiments of speakers in the input audio. |
||
Detects the start and end time of each segment of human voice in the audio. |
||
Scores the quality of the audio containing human voice segments. |
||
Identifies silent segments in the audio and the confidence, and provides the proportion of silent segments. |
||
Identifies the audio content, and returns the start time, end time, and content of each speaker. |
||
Labels personal privacy voice content. |
||
Labels prohibited speech. |
||
Labels politically sensitive speech. |
||
Labels pornographic content. |
Noise Addition
- Applicable file format: pure audio file in WAV format (audio duration ≤ 60s; sampling rate: 16 kHz)
- Operator description: Adds noise to the audio.
- Parameter description:
Noise type: type of the noise to be added. The mixed noise is the superposition of Gaussian noise and salt-and-pepper noise.
Signal-to-noise ratio (SNR): ratio of the normal sound signal strength to the noise signal strength.
Voice Anonymization
- Applicable file format: pure audio file in WAV format (audio duration ≤ 30s; sampling rate: 16 kHz; bit depth: 16 bits; single-channel)
- Operator description: Anonymizes the audio. The anonymized audio differs greatly from the the original one in the speaker timbre and voiceprint.
- Parameter configuration example
Voice Noise Reduction
- Applicable file format: pure audio file in WAV format (sampling rate: 16 kHz; bit depth: 16 bits; single-channel)
- Operator description: Reduces the noise in the original audio. Only the situation where noise and human voices overlap is considered. No restriction is imposed on pure noise audio or pure noise segments.
- Parameter configuration example
Audio Quantization Encoding
- Applicable file format: pure audio file (file size ≤ 100 MB)
- Operator description: Converts a high-resolution audio file with header information into a 16 kHz alaw/ulaw/pcm/wav file using audio encoding and decoding technologies and quantization compression technologies.
- Parameter configuration example
Speech-to-Text Conversion (Mandarin)
- Applicable file format: pure audio file (audio duration ≤ 60s)
- Operator description: Converts Mandarin speech into text quickly to enrich human-machine interaction scenarios.
- Parameter description
Punctuation: whether to add punctuation marks to the recognition result
Digit conversion: whether to recognize numbers in speech as Arabic numerals
Word segmentation information: whether the recognition result contains the word segmentation result
Multi-speaker Speech Recognition
- Applicable file formats: pure audio file (audio duration ≤ 1 hour; single-channel)
- Operator description: Identifies the audio content, and returns the start time, end time, and content of each speaker.
- Parameter description
Punctuation: whether to add punctuation marks to the recognition result
Digit conversion: whether to recognize numbers in speech as Arabic numerals
Word segmentation information: whether the recognition result contains the word segmentation result
Speaker separation: whether the recognition result contains speaker information
Speaking speed: whether the recognition result contains the speaking speed
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot