espnet audio text-to-speech

TTS model trained with Montreal Forced Aligner.

To replicate or continue training from the given checkpoint, download LJSpeech, install MFA and follow the steps here. I recommend downloading the pretrained MFA models and running mfa.sh with --train false. This would help to save time, but one disadvantage is you have to use the MFA g2p for inference (which has a non-standard phoneme set).