automatic-speech-recognition bn hf-asr-leaderboard openslr_SLR53 robust-speech-event

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the OPENSLR_SLR53 - bengali dataset. It achieves the following results on the evaluation set.

Without language model :

With 5 gram language model trained on 30M sentences randomly chosen from AI4Bharat IndicCorp dataset :

Note : 5% of a total 10935 samples have been used for evaluation. Evaluation set has 10935 examples which was not part of training training was done on first 95% and eval was done on last 5%. Training was stopped after 180k steps. Output predictions are available under files section.

Training hyperparameters

The following hyperparameters were used during training:

Framework versions
