espnet audio text-to-speech

TTS model - ProDiff with GST + X-Vector

No support given.

Details

num_iters_per_epoch: 250
max_epoch: 800
batch_bins: 8000000
tts_conf: 
    spk_embed_dim: 192