This model is trained by using so-vits-svc-fork

Examples:

Sample

Hardware used:

Acquiring the dataset

Software used:

The dataset I used for this model: Dataset

<h3>Step 1</h3> Find videos, music, podcasts or whatever that contains the voice you want to make a model of. <br>

<h3>Step 2</h3> Snip out the parts of the videos/music you want to use for the dataset. The clearer the audio, the better. This means no background noise whatsoever. <b>Each file must be a maximum of 10 seconds!</b><br> You can do this via Audacity or any other software you feel familiar with.<br> For a decent model, you will need about 100 samples.

<h3>Step 3</h3>

If a sample has a background noise (which it will most likely have), remove it via ultimatevocalremovergui

Removing background noises

<h3>Installing the requirements</h3> <h3>Step 1</h3>

Install ultimatevocalremovergui by following the following steps:<br>

channels:
  - defaults
dependencies:
  - python=3.10
  - tk
  - pip
  - pip:
    - -r requirements.txt

<h3>Step 2</h3>

The software will now startup (this might take a bit). It will look like this: UVR First we need to download a model like so:

Now that the model is downloaded we are going to remove the background noise from our voice sample. To do this do the following:

Training the model

Here is a quick explanation on how I trained this model.

Software used:

<h3>Step 1</h3>

First, install qprgraph:

Now, clone the so-vits-svc-fork repo:

Then, cd into the repo:

Now, make a conda environment:

Now, activate the conda environment:

Now, install the requirements:

<h3>Step 2</h3>

Using the model

<h3>Step 1</h3>

Now, run the program:<br>

On the right side in the application that just opened, make sure to set the input device to default (ALSA) and the output also to default (ALSA) Example

<h3>Step 2</h3>

<h3>Step 3</h3>

<h3>Step 4</h3>

<h3>Additional info</h3>

If nothing happens, take a look at the terminal and act accordingly