wav2vec2-large-en-in-lm

This model is a fine-tuned version of crossdelenna/wav2vec2-large-en-in-lm

It achieves the following results on the evaluation set:

Loss: 0.0478
Wer: 0.0951

Model description

Wav2vec2 Automatic speech recognition for Indian English accent using the language model.

Intended uses & limitations

This model is intended for my personal use only. Intentionally, the data set has absolutely no speech variance. It is fine-tuned only on my own data and I am using it for live speech dictation with Pyaudio non-blocking streaming microphone data (https://gist.github.com/KenoLeon/13dfb803a21a08cf224b2e6df0feed80). Before inference, train further on your own data. The training data has a lot of quantitative finance-related jargon and a lot of urban slang. Note that it doesn't hash out F words, so NSFW.

Training and evaluation data

Facebook base large dataset further fine-tuned on thirty-two hours of personal recordings. It has a male voice with an Indian English accent. The recording is done on the omnidirectional microphone with a lot of background noise.

Training procedure

I downloaded my Reddit and Twitter data and started recording each clip not exceeding 13 seconds. When I got enough sample size of 6 hrs I fine-tuned the model with approximately 19% WER. Afterwards, I kept adding the data and kept fine-tuning it. It is now trained on thirty hours of data. (Now the idea is to fine-tune every two-three months only on unrecognized words)

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.1589	10.0	1210	0.0754	0.1088
0.1369	20.0	2420	0.0527	0.0991
0.1208	30.0	3630	0.0478	0.0951

Framework versions

Transformers 4.17.0
Pytorch 1.12.1+cu113
Datasets 2.4.0
Tokenizers 0.12.1