<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
finetune_teacher_babble_noise_mozilla_200_epochs
This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:
- Loss: 71.8264
- Wer: 0.3574
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 256
- total_train_batch_size: 1024
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 200
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
149.8494 | 14.7 | 1000 | 41.8514 | 0.3998 |
101.5704 | 29.41 | 2000 | 41.9244 | 0.3942 |
87.7921 | 44.12 | 3000 | 44.8273 | 0.4013 |
74.0441 | 58.82 | 4000 | 48.9263 | 0.3976 |
61.9751 | 73.53 | 5000 | 48.6313 | 0.3950 |
51.4311 | 88.23 | 6000 | 52.6974 | 0.3915 |
42.7197 | 102.94 | 7000 | 51.2589 | 0.3862 |
35.5205 | 117.64 | 8000 | 57.6496 | 0.3841 |
29.2148 | 132.35 | 9000 | 64.6558 | 0.3745 |
24.4399 | 147.06 | 10000 | 62.6512 | 0.3692 |
20.5101 | 161.76 | 11000 | 67.4978 | 0.3625 |
18.0444 | 176.47 | 12000 | 72.0740 | 0.3584 |
16.681 | 191.18 | 13000 | 71.8264 | 0.3574 |
Framework versions
- Transformers 4.24.0
- Pytorch 1.12.1
- Datasets 2.7.1
- Tokenizers 0.11.0