<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
wav2vec2-xlsr-mn-eng
This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the IEMOCAP and Common Voice's MN dataset. Can be used to recognize speech on ENG and MN simultaneously. It achieves the following results on the evaluation set:
- Loss: 0.3087
- Wer: 0.3402
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 5
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
8.8609 | 0.08 | 500 | 3.6078 | 1.0 |
3.5494 | 0.15 | 1000 | 3.2044 | 1.0 |
3.1699 | 0.23 | 1500 | 3.1560 | 1.0 |
3.0955 | 0.3 | 2000 | 3.1087 | 1.0 |
2.7918 | 0.38 | 2500 | 2.1146 | 1.0236 |
2.0528 | 0.45 | 3000 | 1.4938 | 0.9648 |
1.6329 | 0.53 | 3500 | 1.2614 | 0.9198 |
1.3932 | 0.6 | 4000 | 1.0504 | 0.8314 |
1.2652 | 0.68 | 4500 | 0.9664 | 0.7809 |
1.1829 | 0.76 | 5000 | 0.8999 | 0.7381 |
1.1674 | 0.83 | 5500 | 0.8200 | 0.6924 |
1.0599 | 0.91 | 6000 | 0.7713 | 0.6729 |
1.027 | 0.98 | 6500 | 0.7714 | 0.6616 |
0.9289 | 1.06 | 7000 | 0.7571 | 0.6433 |
0.9192 | 1.13 | 7500 | 0.6899 | 0.6151 |
0.8996 | 1.21 | 8000 | 0.7012 | 0.6104 |
0.9281 | 1.28 | 8500 | 0.6452 | 0.5914 |
0.8656 | 1.36 | 9000 | 0.6162 | 0.5781 |
0.8635 | 1.44 | 9500 | 0.6249 | 0.5672 |
0.8388 | 1.51 | 10000 | 0.5936 | 0.5558 |
0.8087 | 1.59 | 10500 | 0.5844 | 0.5466 |
0.7755 | 1.66 | 11000 | 0.5838 | 0.5364 |
0.8377 | 1.74 | 11500 | 0.5358 | 0.5202 |
0.8308 | 1.81 | 12000 | 0.5333 | 0.5196 |
0.7775 | 1.89 | 12500 | 0.5129 | 0.5060 |
0.7747 | 1.96 | 13000 | 0.5164 | 0.5096 |
0.7115 | 2.04 | 13500 | 0.5056 | 0.4936 |
0.6974 | 2.12 | 14000 | 0.4925 | 0.4878 |
0.6672 | 2.19 | 14500 | 0.5030 | 0.4908 |
0.6396 | 2.27 | 15000 | 0.4821 | 0.4686 |
0.6943 | 2.34 | 15500 | 0.4693 | 0.4624 |
0.6413 | 2.42 | 16000 | 0.4626 | 0.4636 |
0.6446 | 2.49 | 16500 | 0.4513 | 0.4609 |
0.6338 | 2.57 | 17000 | 0.4386 | 0.4524 |
0.6208 | 2.65 | 17500 | 0.4360 | 0.4445 |
0.6397 | 2.72 | 18000 | 0.4348 | 0.4355 |
0.6127 | 2.8 | 18500 | 0.4367 | 0.4318 |
0.5956 | 2.87 | 19000 | 0.4376 | 0.4322 |
0.6345 | 2.95 | 19500 | 0.4050 | 0.4308 |
0.572 | 3.02 | 20000 | 0.4211 | 0.4219 |
0.5447 | 3.1 | 20500 | 0.4042 | 0.4112 |
0.5323 | 3.17 | 21000 | 0.4101 | 0.4153 |
0.5677 | 3.25 | 21500 | 0.3952 | 0.4188 |
0.5354 | 3.33 | 22000 | 0.3889 | 0.4007 |
0.5297 | 3.4 | 22500 | 0.3793 | 0.3997 |
0.5314 | 3.48 | 23000 | 0.3684 | 0.3956 |
0.5217 | 3.55 | 23500 | 0.3572 | 0.3853 |
0.5224 | 3.63 | 24000 | 0.3535 | 0.3867 |
0.4983 | 3.7 | 24500 | 0.3636 | 0.3804 |
0.5355 | 3.78 | 25000 | 0.3680 | 0.3770 |
0.5115 | 3.85 | 25500 | 0.3472 | 0.3752 |
0.5416 | 3.93 | 26000 | 0.3280 | 0.3689 |
0.5104 | 4.01 | 26500 | 0.3319 | 0.3650 |
0.4524 | 4.08 | 27000 | 0.3453 | 0.3632 |
0.462 | 4.16 | 27500 | 0.3359 | 0.3600 |
0.4823 | 4.23 | 28000 | 0.3268 | 0.3553 |
0.4671 | 4.31 | 28500 | 0.3248 | 0.3535 |
0.4702 | 4.38 | 29000 | 0.3278 | 0.3501 |
0.483 | 4.46 | 29500 | 0.3183 | 0.3492 |
0.4232 | 4.53 | 30000 | 0.3224 | 0.3470 |
0.4227 | 4.61 | 30500 | 0.3171 | 0.3458 |
0.4687 | 4.69 | 31000 | 0.3121 | 0.3537 |
0.4486 | 4.76 | 31500 | 0.3088 | 0.3424 |
0.4459 | 4.84 | 32000 | 0.3101 | 0.3407 |
0.4513 | 4.91 | 32500 | 0.3077 | 0.3407 |
0.4237 | 4.99 | 33000 | 0.3087 | 0.3402 |
Framework versions
- Transformers 4.19.2
- Pytorch 1.11.0+cu113
- Datasets 2.2.2
- Tokenizers 0.12.1