Help Center/ MetaStudio/ User Guide/ Voice Modeling/ Creating a Voice Modeling Task (with Third-party Models)
Updated on 2024-11-27 GMT+08:00

Creating a Voice Modeling Task (with Third-party Models)

You can view the preset voices of MetaStudio on the Video Production or Livestreams page. If the preset voices cannot meet your requirements, you can use third-party models to customize voices.

The following third-party models are supported:

  • DupDub: 18 input languages (Chinese, English, Cantonese, German, French, Turkish, Tagalog, Japanese, Italian, Malay, Russian, Korean, Finnish, Dutch, Spanish, Indonesian, Arabic, and Portuguese) supported. See Procedure (for DupDub).
  • AudioX: 3 input languages (Chinese, English, and Thai) supported. See Procedure (for AudioX). The generated timbre can be identified and used to read in three languages (Chinese, English, and Thai) during video production and livestreaming.

Constraints

  • Only enterprise users can customize voices on MetaStudio.
  • A cloned voice cannot be used for livestreaming or intelligent interaction.

Preparations

Before creating a voice modeling task, you need to prepare the following items by referring to Procedure (for DupDub):

  • If you select Script Upload, record an audio in advance by referring to the recording guide on the voice modeling page.
  • Purchase and activate a DupDub non-English/Chinese language cloning package by referring to Purchasing a DupDub Voice Package.

Procedure (for DupDub)

  1. Log in to the MetaStudio console.
  1. Click Create under Voice modeling.
  2. Click the Third-party Models tab and select Voice modeling (DupDub).

    On the page displayed, the area on the left is for voice modeling, and the area on the right shows the voice modeling process, as shown in Figure 1.
    Figure 1 Customizing a voice

  3. Configure voice modeling parameters.

    For details, see Table 1.

    Table 1 GUI operations

    Parameter

    Description

    Voice modeling (DupDub)

    Select Voice modeling (DupDub). The recorded audio should be a WAV or MP3 file of 10 to 60 seconds (recommended: 30 seconds). Voice modeling in 18 languages is supported.

    If remaining quota is 0, click Buy Now to purchase a non-English/Chinese language cloning package by referring to Process of Purchasing a DupDub Non-English/Chinese Language Cloning Package.

    Voice Settings

    Enter a voice name.

    Example: emotion_joyful_healing

    Voice Gender

    Gender of the voice. Options:

    • Male
    • Female

    Input Language

    Language of the voice. Options: Chinese, English, Cantonese, German, French, Turkish, Tagalog, Japanese, Italian, Malay, Russian, Korean, Finnish, Dutch, Spanish, Indonesian, Arabic, and Portuguese.

    Voice Tag

    Tag of the voice. Options:

    • News
    • Marketing

    Script of each of the preceding tags is preset in MetaStudio, as shown in Script Examples (Advanced Edition). When using the preset script, you must select the corresponding tag.

    Produce Voice

    You can follow the recording guide provided on the GUI to record a 1-minute WAV or MP3 file, which can be directly uploaded without being compressed or containing TXT files.

    If the preset script is not used, the voice tag is only used to indicate the application scenario.

    Mobile Number (Optional)

    Enter a mobile number.

  4. Click Submit.

    The Information dialog box is displayed, notifying you of the remaining voice modeling quota and indicating that one resource will be consumed this time.

  5. After confirming the information, click Submit.

    After the voice modeling task is submitted, the message Production task submitted is displayed, as shown in Figure 2.

    After the voice modeling task is submitted, the task review will take about one day. After the task is approved, you can start voice modeling. The task takes about 5 working days.
    Figure 2 Production task submitted

  6. You can click View Production Tasks to view the review progress of the voice modeling task.

    When the status changes to Reviewed, algorithm training is automatically started. If there are multiple algorithm training tasks, queuing and delay may occur.

Procedure (for AudioX)

  1. Log in to the MetaStudio console.
  1. Click Create under Voice modeling.
  2. Click the Third-party Models tab and select Voice modeling (AudioX).

    On the page displayed, the area on the left is for voice modeling, and the area on the right shows the voice modeling process, as shown in Figure 3.
    Figure 3 Customizing a voice

  3. Configure voice modeling parameters.

    For details, see Table 2.

    Table 2 GUI operations

    Parameter

    Description

    Voice modeling (AudioX)

    Select Voice modeling (AudioX). Voice modeling in Chinese, English, and Thai is supported.

    The recorded audio should be a WAV or MP3 file of 5 to 15 seconds (recommended: 10 seconds). Uploading an audio file of unsupported duration will cause the voice modeling task to fail the review. In this case, you need to submit an audio file of supported duration for training.

    If remaining quota is 0, click Buy Now to purchase a non-English/Chinese language cloning package by referring to Process of Purchasing a DupDub Non-English/Chinese Language Cloning Package.

    Voice Settings

    Enter a voice name.

    Example: emotion_joyful_healing

    Voice Gender

    Gender of the voice. Options:

    • Male
    • Female

    Input Language

    Language of the voice. Options:

    • Chinese
    • English
    • Thai

    Voice Tag

    Tag of the voice. Options:

    • News
    • Marketing

    Script of each of the preceding tags is preset in MetaStudio, as shown in Script Examples (Advanced Edition). When using the preset script, you must select the corresponding tag.

    Produce Voice

    You can follow the recording guide provided on the GUI to record a 1-minute WAV or MP3 file, which can be directly uploaded without being compressed or containing TXT files.

    If the preset script is not used, the voice tag is only used to indicate the application scenario.

    Mobile Number (Optional)

    Enter a mobile number.

  4. Click Submit.

    The Information dialog box is displayed, notifying you of the remaining voice modeling quota and indicating that one resource will be consumed this time.

  5. After confirming the information, click Submit.

    After the voice modeling task is submitted, the message Production task submitted is displayed, as shown in Figure 2.

    After the voice modeling task is submitted, the task review will take about one day. After the task is approved, you can start voice modeling. The task takes about 5 working days.
    Figure 4 Production task submitted

  6. You can click View Production Tasks to view the review progress of the voice modeling task.

    When the status changes to Reviewed, algorithm training is automatically started. If there are multiple algorithm training tasks, queuing and delay may occur.