Recording a Human Audio
You can upload a human audio recording to MetaStudio for AI training to obtain a voice model that reproduces the human timbre at 1:1.
The voice model can be used for text-to-speech conversion and applied to scenarios such as virtual avatar video production, livestreaming, and intelligent interaction.
For voice modeling, record and generate an entire WAV or MP3 audio file of 10 to 30 minutes (recommended: 15 minutes).
Preparing for Recording
Recording Device and Software |
Recording Environment |
Recording Script |
---|---|---|
Professional recording devices (recommended: Adobe Audition) are preferred for audio recording. If professional recording devices are not available, you can use your mobile phone for recording. See Recording an Audio on a Mobile Phone. |
|
You are advised to use Script Examples (Advanced Edition). You can also customize the script. The length of one phrase must be the same as that in the example. Improvised recording is not recommended as there may be too many fillers that compromise the speech coherence. |
Starting Recording
The recorded audio must be high-quality, free of noises and background sounds, and of the same person. You can use an iPhone or Android mobile phone to record videos. See Recording an Audio on a Mobile Phone.
Table 2 describes the precautions for recording.
Item |
Description |
---|---|
Distance from the microphone |
Adjust the distance from the microphone. The one-punch distance is appropriate. To avoid pop sound effects or recording the breath sound, do not be too close to the microphone. |
Recording content |
The starting number of each piece of script does not need to be read. For example, for the script "4. It features a multitude of functions and superior performance", 4 does not need to be read. |
Audio format |
Save the audio file in a lossless format, such as WAV and MP3. The recording data should not be encoded (sample rate of 48 kHz, sample bit of 16 bits, and mono). |
Speech style |
Keep the speech style consistent throughout the recording to avoid excessive emotions. |
Pronunciation |
Pronunciation should be clear and accurate, and the volume should be moderate. If there is undesired sound, record the phrase again. |
Speed and rhythm |
The speed of speech should be natural and stable. Do not be too fast or too slow. |
Moderate volume |
The volume cannot be too low or too high, or fluctuate. Clipping noise is not allowed. |
Pause |
Pause naturally and breathe softly upon punctuations and appropriate positions. There must be a pause of 2–3 seconds between phrases for a long audio file. |
Accent position |
Find the correct accent position to avoid wrong accent. |
Reading pronunciation |
Read in order, ensure the phonetic consistency (avoid missing or adding words), and avoid wrong pronunciation. If there is a misreading or the reading is not smooth, record the whole phrase again. |
Content |
Merging several audio files into one audio file for training will fail the review. |
Submitting an Audio File
Record all phrases in one single WAV or MP3 audio file, with a pause of two to three seconds between each phrase. You can upload the WAV or MP3 file to the MetaStudio console without compressing it or providing a TXT script file. The preset script is recommended. You can also customize the script. The text is automatically split based on pauses and identified.
You can customize the audio file name, for example, Voice.wav.
Creating a Voice Model
After the audio file is available, you can upload it to the MetaStudio console for voice training by following:
The task takes about seven working days.
Application scenarios of a customized voice:
- After a customized voice is generated, it is automatically displayed in the voice list on the MetaStudio console. This voice can be used for virtual avatar video production, livestreaming, or intelligent interaction.
- A customized voice can be called using the APIs of MetaStudio.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot