Help Center/ MetaStudio/ User Guide/ Voice Modeling/ Creating a Voice Modeling Task (with Third-party Models)

Updated on 2024-12-23 GMT+08:00

View PDF

Creating a Voice Modeling Task (with Third-party Models)

You can view the preset voices of MetaStudio on the Video Production or Livestreams page. If the preset voices cannot meet your requirements, you can use third-party models to customize voices.

The following third-party models are supported:

DupDub: 18 input languages (Chinese, English, Cantonese, German, French, Turkish, Tagalog, Japanese, Italian, Malay, Russian, Korean, Finnish, Dutch, Spanish, Indonesian, Arabic, and Portuguese) supported. See Procedure (for DupDub).
AudioX: 3 input languages (Chinese, English, and Thai) supported. See Procedure (for AudioX). The generated timbre can be identified and used to read in three languages (Chinese, English, and Thai) during video production and livestreaming.

Constraints

Only enterprise users can customize voices on MetaStudio.
A cloned voice cannot be used for livestreaming or intelligent interaction.

Preparations

Before creating a voice modeling task, you need to prepare the following items by referring to Procedure (for DupDub):

If you select Script Upload, record an audio in advance by referring to the recording guide on the voice modeling page.
Purchase and activate a DupDub non-English/Chinese language cloning package by referring to Purchasing a DupDub Voice Package.

Procedure (for DupDub)

Click Create under Voice modeling.
Click the Third-party Models tab and select Voice modeling (DupDub).

On the page displayed, the area on the left is for voice modeling, and the area on the right shows the voice modeling process, as shown in Figure 1.
Figure 1 Customizing a voice

Configure voice modeling parameters.

For details, see Table 1.

**Table 1** GUI operations
Parameter	Description
Voice modeling (DupDub)	Select Voice modeling (DupDub). The recorded audio should be a WAV or MP3 file of 10 to 60 seconds (recommended: 30 seconds). Voice modeling in 18 languages is supported. If remaining quota is 0, click Buy Now to purchase a non-English/Chinese language cloning package by referring to Process of Purchasing a DupDub Non-English/Chinese Language Cloning Package.
Voice Settings	Enter a voice name. Example: emotion_joyful_healing
Voice Gender	Gender of the voice. Options: Male Female
Input Language	Language of the voice. Options: Chinese, English, Cantonese, German, French, Turkish, Tagalog, Japanese, Italian, Malay, Russian, Korean, Finnish, Dutch, Spanish, Indonesian, Arabic, and Portuguese.
Voice Tag	Tag of the voice. Options: News Marketing Script of each of the preceding tags is preset in MetaStudio, as shown in Script Examples (Advanced Edition). When using the preset script, you must select the corresponding tag.
Produce Voice	You can follow the recording guide provided on the GUI to record a 1-minute WAV or MP3 file, which can be directly uploaded without being compressed or containing TXT files. If the preset script is not used, the voice tag is only used to indicate the application scenario.
Mobile Number (Optional)	Enter a mobile number.

Click Submit.

The Information dialog box is displayed, notifying you of the remaining voice modeling quota and indicating that one resource will be consumed this time.
After confirming the information, click Submit.

After the voice modeling task is submitted, the message Production task submitted is displayed, as shown in Figure 2.

After the voice modeling task is submitted, the task review will take about one day. After the task is approved, you can start voice modeling. The task takes about 5 working days.
Figure 2 Production task submitted
You can click View Production Tasks to view the review progress of the voice modeling task.

When the status changes to Reviewed, algorithm training is automatically started. If there are multiple algorithm training tasks, queuing and delay may occur.

Procedure (for AudioX)

Click Create under Voice modeling.
Click the Third-party Models tab and select Voice modeling (AudioX).

On the page displayed, the area on the left is for voice modeling, and the area on the right shows the voice modeling process, as shown in Figure 3.
Figure 3 Customizing a voice

Configure voice modeling parameters.

For details, see Table 2.

**Table 2** GUI operations
Parameter	Description
Voice modeling (AudioX)	Select Voice modeling (AudioX). Voice modeling in Chinese, English, and Thai is supported. The recorded audio should be a WAV or MP3 file of 5 to 15 seconds (recommended: 10 seconds). Uploading an audio file of unsupported duration will cause the voice modeling task to fail the review. In this case, you need to submit an audio file of supported duration for training. If remaining quota is 0, click Buy Now to purchase a non-English/Chinese language cloning package by referring to Process of Purchasing a DupDub Non-English/Chinese Language Cloning Package.
Voice Settings	Enter a voice name. Example: emotion_joyful_healing
Voice Gender	Gender of the voice. Options: Male Female
Input Language	Language of the voice. Options: Chinese English Thai
Voice Tag	Tag of the voice. Options: News Marketing Script of each of the preceding tags is preset in MetaStudio, as shown in Script Examples (Advanced Edition). When using the preset script, you must select the corresponding tag.
Produce Voice	You can follow the recording guide provided on the GUI to record a 1-minute WAV or MP3 file, which can be directly uploaded without being compressed or containing TXT files. If the preset script is not used, the voice tag is only used to indicate the application scenario.
Mobile Number (Optional)	Enter a mobile number.

Click Submit.

The Information dialog box is displayed, notifying you of the remaining voice modeling quota and indicating that one resource will be consumed this time.
After confirming the information, click Submit.

After the voice modeling task is submitted, the message Production task submitted is displayed, as shown in Figure 2.

After the voice modeling task is submitted, the task review will take about one day. After the task is approved, you can start voice modeling. The task takes about 5 working days.
Figure 4 Production task submitted
You can click View Production Tasks to view the review progress of the voice modeling task.

When the status changes to Reviewed, algorithm training is automatically started. If there are multiple algorithm training tasks, queuing and delay may occur.