audio automatic-speech-recognition speech

Fine-tuned on Multilingual Pretrained Model CLSRIL-23. The original fairseq checkpoint is present here. When using this model, make sure that your speech input is sampled at 16kHz. Note: The result from this model is without a language model so you may witness a higher WER in some cases.