Ukrainian STT model (with Language Model)

🇺🇦 Join Ukrainian Speech Recognition Community - https://t.me/speech_recognition_uk

⭐ See other Ukrainian models - https://github.com/egorsmkv/speech-recognition-uk

This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA-FOUNDATION/COMMON_VOICE_7_0 - UK dataset.

It achieves the following results on the evaluation set without the language model:

Model description

On 100 test example the model shows the following results:

Without LM:

With LM:

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
1.2815	7.93	500	0.3536	0.4753	0.1009
1.0869	15.86	1000	0.2317	0.3111	0.0614
0.9984	23.8	1500	0.2022	0.2676	0.0521
0.975	31.74	2000	0.1948	0.2469	0.0487
0.9306	39.67	2500	0.1916	0.2377	0.0464
0.8868	47.61	3000	0.1903	0.2257	0.0439
0.8424	55.55	3500	0.1786	0.2206	0.0423
0.8126	63.49	4000	0.1849	0.2160	0.0416
0.7901	71.42	4500	0.1869	0.2138	0.0413
0.7671	79.36	5000	0.1855	0.2075	0.0394
0.7467	87.3	5500	0.1884	0.2049	0.0389
0.731	95.24	6000	0.1877	0.2060	0.0387

python eval.py --model_id Yehor/wav2vec2-xls-r-1b-uk-with-lm --dataset mozilla-foundation/common_voice_7_0 --config uk --split test

Without LM	With LM (run `./eval.py`)
21.52	14.62