AgnesTachyon So-vits-svc 4.1 Model

A so-vits-svc 4.1 model of AgnesTachyon in Uma Musume: Pretty Derby.

Model Details

Model Description

This is a so-vits-svc 4.1 model of AgnesTachyon in Uma Musume: Pretty Derby.

Developed by: svc-develop-team
Trained by: 70295
Model type: Audio to Audio
License: CC BY-NC 4.0

Uses

Clone the so-vits-svc repository and install all dependencies.
Create a new folder named "models" and place the "AgnesTachyon" folder inside it.
Navigate to the directory of "so-vits-svc" and execute the following command by replacing "xxx.wav" with the name of your source audio file and "x" with the desired key to raise/lower.

python inference_main.py -m "models/AgnesTachyon/AgnesTachyon.pth" -c "models/AgnesTachyon/config.json" -n "xxx.wav" -t x -s "AgnesTachyon"

Shallow diffusion model, cluster model and feature index model is also provided. Check the README.md file of the so-vits-svc project for more information.

Training Details

Training Data

All of the training data is extracted from the Windows client of Uma Musume: Pretty Derby using the umamusume-voice-text-extractor.
The copyright of the training dataset belongs to Cygames.
Only the voice is used, the live music soundtrack is not included in the training dataset.

Training Procedure

Training Environment Preparation

Download the base models mentioned in the README.md file of the so-vits-svc project.
You should download checkpoint_best_legacy_500.pt , D_0.pth, G_0.pth(for sovits model), model_0.pt(for shallow diffusion) , rmvpe.pt(for the f0 predictor RMVPE), model(for NSF_hifigan).
Place checkpoint_best_legacy_500.pt, rmvpe.pt in .\pretrain, place model and its config.json in .\pretrain\nsf_hifigan, place D_0.pth, G_0.pth in .\logs\44k, place model_0.pt in .\logs\44k\diffusion .
Credits: The D_0.pth and G_0.pth provided above is from OOPPEENN.

Preprocessing

Delete all WAV files smaller than 400KB, and copy them to .\dataset_raw\AgnesTachyon
Navigate to the directory of "so-vits-svc" and execute python resample.py --skip_loudnorm .
Execute python preprocess_flist_config.py --speech_encoder vec768l12 --vol_aug .
Edit the parameters in config.json and diffusion.yaml.
Execute python preprocess_hubert_f0.py --f0_predictor rmvpe --use_diff

Training

Execute python train.py -c configs/config.json -m 44k .

[Optional]

Execute python train_diff.py -c configs/diffusion.yaml to train the shallow diffusion model.
Execute python cluster/train_cluster.py --gpu to train the cluster model.
Execute python train_index.py -c configs/config.json to train the feature index model.

Training Hyperparameters

Please check config.json and diffusion.yaml for training hyperparameters

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: RTX 3090
Hours used: 41.6
Provider: Myself
Compute Region: Mainland China
Carbon Emitted: ~16.02kg CO2