- What's New
- Service Overview
- Service Brochure
- Getting Started
-
User Guide
- Prerequisites
- Permissions Management
- Image Modeling
- Voice Modeling
- Video Production
- Livestreaming
- Asset Management
- Appendix
-
API Reference
- Before You Start
- API Overview
- Calling APIs
- Asset Management
- Virtual Avatar Video Production
-
Virtual Avatar Livestreaming
-
Intelligent Livestreaming Room Management
- Creating an Intelligent Livestreaming Room
- Querying Intelligent Livestreaming Rooms
- Querying Intelligent Livestreaming Script Details
- Updating Intelligent Livestreaming Room Information
- Deleting an Intelligent Livestreaming Room
- Creating an Interaction Rule Library for Live Rooms
- Querying Interaction Rule Libraries for Live Rooms
- Updating an Interaction Rule Library for Live Rooms
- Deleting an Interaction Rule Library for Live Rooms
-
Livestreaming Task Management
- Starting a Virtual Human Intelligent Livestreaming Task
- Querying Livestreaming Tasks of a Live Room
- Querying Virtual Human Intelligent Livestreaming Task Details
- Ending a Virtual Human Intelligent Livestreaming Task
- Controlling Virtual Human Intelligent Livestreaming
- Querying All Virtual Human Livestreaming Tasks of a Tenant
- Reporting Livestreaming Events
- Live Product Management
-
Intelligent Livestreaming Room Management
-
Image Modeling Management
- Creating a Virtual Avatar Model Training Task
- Querying Virtual Avatar Model Training Tasks
- Querying Details About a Virtual Avatar Model Training Task
- Deleting a Virtual Avatar Model Training Task
- Updating a Virtual Avatar Model Training Task
- Executing a Virtual Avatar Model Training Task as a Tenant
-
Voice Modeling Task Management
- Creating a Voice Training Task (Basic Edition)
- Creating a Voice Training Task (Advanced Edition)
- Creating a Voice Training Task (Premium Edition)
- Querying Voice Training Tasks
- Submitting a Voice Training Task
- Querying Voice Training Task Details
- Deleting a Voice Training Task
- Querying Task Operation Logs
- Obtaining the URL for Uploading a Voice File
- Obtaining the Review Result of a Voice Training Task
- Confirming the Online Recording Result
- Obtaining the Confirmed Online Recording Result
- TTS Management
- Appendix
- Change History
- ssdk
- FAQs
- Videos
- General Reference
Copied.
Recording a Human Audio
You can upload a human audio recording to MetaStudio for AI training to obtain a voice model that reproduces the human timbre at 1:1.
The voice model can be used for text-to-speech conversion and applied to scenarios such as virtual avatar video production, livestreaming, and intelligent interaction.
For voice modeling, record and generate an entire WAV or MP3 audio file of 10 to 30 minutes (recommended: 15 minutes).
Preparing for Recording
Recording Device and Software |
Recording Environment |
Recording Script |
---|---|---|
Professional recording devices (recommended: Adobe Audition) are preferred for audio recording. If professional recording devices are not available, you can use your mobile phone for recording. See Recording an Audio on a Mobile Phone. |
|
You are advised to use Script Examples (Advanced Edition). You can also customize the script. The length of one phrase must be the same as that in the example. Improvised recording is not recommended as there may be too many fillers that compromise the speech coherence. |
Starting Recording
The recorded audio must be high-quality, free of noises and background sounds, and of the same person. You can use an iPhone or Android mobile phone to record videos. See Recording an Audio on a Mobile Phone.
Table 2 describes the precautions for recording.
Item |
Description |
---|---|
Distance from the microphone |
Adjust the distance from the microphone. The one-punch distance is appropriate. To avoid pop sound effects or recording the breath sound, do not be too close to the microphone. |
Recording content |
The starting number of each piece of script does not need to be read. For example, for the script "4. It features a multitude of functions and superior performance", 4 does not need to be read. |
Audio format |
Save the audio file in a lossless format, such as WAV and MP3. The recording data should not be encoded (sample rate of 48 kHz, sample bit of 16 bits, and mono). |
Speech style |
Keep the speech style consistent throughout the recording to avoid excessive emotions. |
Pronunciation |
Pronunciation should be clear and accurate, and the volume should be moderate. If there is undesired sound, record the phrase again. |
Speed and rhythm |
The speed of speech should be natural and stable. Do not be too fast or too slow. |
Moderate volume |
The volume cannot be too low or too high, or fluctuate. Clipping noise is not allowed. |
Pause |
Pause naturally and breathe softly upon punctuations and appropriate positions. There must be a pause of 2–3 seconds between phrases for a long audio file. |
Accent position |
Find the correct accent position to avoid wrong accent. |
Reading pronunciation |
Read in order, ensure the phonetic consistency (avoid missing or adding words), and avoid wrong pronunciation. If there is a misreading or the reading is not smooth, record the whole phrase again. |
Content |
Merging several audio files into one audio file for training will fail the review. |
Submitting an Audio File
Record all phrases in one single WAV or MP3 audio file, with a pause of two to three seconds between each phrase. You can upload the WAV or MP3 file to the MetaStudio console without compressing it or providing a TXT script file. The preset script is recommended. You can also customize the script. The text is automatically split based on pauses and identified.
You can customize the audio file name, for example, Voice.wav.
Creating a Voice Model
After the audio file is available, you can upload it to the MetaStudio console for voice training by following:
The task takes about seven working days.
Application scenarios of a customized voice:
- After a customized voice is generated, it is automatically displayed in the voice list on the MetaStudio console. This voice can be used for virtual avatar video production, livestreaming, or intelligent interaction.
- A customized voice can be called using the APIs of MetaStudio.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot