bene-ges/tts_ru_hifigan_ruslan - AI Model Zoo - BimAnt

tts text-to-speech Vocoder

How to use

See example of inference pipeline for Russian TTS (G2P + FastPitch + HifiGAN) in this notebook. Or use this bash-script.

Input

This model accepts batches of mel spectrograms.

Output

This model outputs audio at 22050Hz.

Training

The NeMo toolkit [1] was used for training the model for several epochs. Full training script is here.

Datasets

This model is trained on RUSLAN [2] corpus (single speaker, male voice) sampled at 22050Hz.

References

[1] NVIDIA NeMo Toolkit
[2] Gabdrakhmanov L., Garaev R., Razinkov E. (2019) RUSLAN: Russian Spoken Language Corpus for Speech Synthesis. In: Salah A., Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science, vol 11658. Springer, Cham