<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mt5-base-gramatika-final-e8-b16
This model is a fine-tuned version of google/mt5-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.2117
- Rouge1: 66.7567
- Rouge2: 59.3343
- Rougel: 66.4993
- Rougelsum: 66.5275
- Gen Len: 18.5566
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
0.9122 | 0.37 | 300 | 0.3395 | 63.1315 | 53.1537 | 62.8285 | 62.8152 | 18.5833 |
0.4611 | 0.73 | 600 | 0.2870 | 64.8744 | 56.0545 | 64.604 | 64.6011 | 18.5676 |
0.3866 | 1.1 | 900 | 0.2690 | 65.2446 | 56.534 | 64.9389 | 64.9484 | 18.5414 |
0.2833 | 1.46 | 1200 | 0.2424 | 65.6718 | 57.2619 | 65.4044 | 65.4076 | 18.5566 |
0.2633 | 1.83 | 1500 | 0.2240 | 65.7057 | 57.6829 | 65.4464 | 65.4601 | 18.5524 |
0.2126 | 2.2 | 1800 | 0.2350 | 66.1634 | 58.4004 | 65.9254 | 65.9147 | 18.5582 |
0.1787 | 2.56 | 2100 | 0.2176 | 66.4508 | 58.8845 | 66.1886 | 66.199 | 18.5571 |
0.175 | 2.93 | 2400 | 0.2151 | 66.1987 | 58.632 | 65.9844 | 65.995 | 18.5603 |
0.1231 | 3.29 | 2700 | 0.2227 | 66.6365 | 59.1886 | 66.4067 | 66.4293 | 18.5571 |
0.1195 | 3.66 | 3000 | 0.2117 | 66.7567 | 59.3343 | 66.4993 | 66.5275 | 18.5566 |
0.1146 | 4.02 | 3300 | 0.2197 | 66.9385 | 59.8666 | 66.7575 | 66.7651 | 18.5556 |
0.0757 | 4.39 | 3600 | 0.2235 | 66.8918 | 59.768 | 66.7208 | 66.7282 | 18.5608 |
0.0772 | 4.76 | 3900 | 0.2270 | 67.0955 | 59.9474 | 66.8681 | 66.8905 | 18.5566 |
0.0688 | 5.12 | 4200 | 0.2431 | 67.2444 | 60.2703 | 67.0501 | 67.0676 | 18.5550 |
0.0512 | 5.49 | 4500 | 0.2439 | 67.198 | 60.2026 | 67.0128 | 67.0433 | 18.5535 |
0.0523 | 5.85 | 4800 | 0.2362 | 67.3463 | 60.4479 | 67.1385 | 67.1792 | 18.5592 |
0.0408 | 6.22 | 5100 | 0.2587 | 67.4973 | 60.7533 | 67.305 | 67.3418 | 18.5624 |
0.0324 | 6.59 | 5400 | 0.2502 | 67.6102 | 60.905 | 67.428 | 67.4547 | 18.5566 |
0.0336 | 6.95 | 5700 | 0.2583 | 67.531 | 60.7718 | 67.355 | 67.3762 | 18.5587 |
0.0236 | 7.32 | 6000 | 0.2710 | 67.5641 | 60.7633 | 67.3445 | 67.3835 | 18.5603 |
0.0222 | 7.68 | 6300 | 0.2729 | 67.5898 | 60.8587 | 67.3926 | 67.4234 | 18.5608 |
Framework versions
- Transformers 4.30.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.3