Voice Modeling

Updated on 2024-09-29 GMT+08:00

View PDF

MetaStudio allows customizing virtual avatar voices.

Before creating a voice modeling task, you need to prepare the following items by referring to Procedure:

Record a WAV human audio file of 100 phrases (with a pause of 2–3 seconds between phrases). Script Examples (Advanced Edition) is recommended.

Under the Huawei Models tab, configure voice modeling parameters.

For details, see Table 1.

**Table 1** GUI operations
Parameter	Description
Voice modeling	If you select Voice modeling (advanced edition), record a WAV audio file of 100 phrases as a whole, with a pause of 2–3 seconds between phrases. The audio duration ranges from 10–30 minutes (recommended: 15 minutes).
Voice Settings	Enter a voice name. Example: emotion_joyful_healing
Voice Gender	Gender of the voice. Example: Female
Input Language	Language of the voice. Example: Chinese
Voice Tag	Tag of the voice. Select a tag based on the selected script example. Options: Marketing News
Produce Voice	If you select Script Upload, upload a WAV audio recording without being compressed or containing TXT files.

Click Submit.

The Information dialog box is displayed, notifying you of the remaining voice modeling quota and indicating that one resource will be consumed this time.
After confirming the information, click Submit.

After the voice modeling task is submitted, the message Production task submitted is displayed, as shown in Figure 2.

After the voice modeling task is submitted, the task review will take about one day. After the task is approved, you can start voice modeling.
Figure 2 Production task submitted
You can click View Production Tasks to view the review progress of the voice modeling task.

When the status changes to Reviewed, algorithm training is automatically started. If there are multiple algorithm training tasks, queuing and delay may occur.
After the training is complete, choose My Creations in the navigation pane.
Select the Voices tab, find the generated voice, and click the avatar in the voice card to preview the voice.

Figure 3 Voice