<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mt5-large-fce-e8-b16
This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3526
- Rouge1: 84.5329
- Rouge2: 76.3656
- Rougel: 83.9027
- Rougelsum: 83.9238
- Gen Len: 15.4614
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
1.2105 | 0.23 | 400 | 0.4344 | 84.6268 | 76.3447 | 84.0402 | 84.0182 | 15.4564 |
0.4664 | 0.45 | 800 | 0.4256 | 84.3821 | 75.6104 | 83.8113 | 83.8303 | 15.4404 |
0.434 | 0.68 | 1200 | 0.3839 | 84.0212 | 75.7319 | 83.4232 | 83.431 | 15.4952 |
0.406 | 0.9 | 1600 | 0.3713 | 84.7743 | 76.7805 | 84.2379 | 84.2352 | 15.4514 |
0.3193 | 1.13 | 2000 | 0.3665 | 84.634 | 76.5132 | 84.0604 | 84.0755 | 15.4774 |
0.2693 | 1.35 | 2400 | 0.3718 | 84.6587 | 76.7057 | 84.099 | 84.1045 | 15.4619 |
0.2815 | 1.58 | 2800 | 0.3617 | 84.5181 | 76.6792 | 83.9922 | 83.9976 | 15.4820 |
0.2776 | 1.81 | 3200 | 0.3526 | 84.5329 | 76.3656 | 83.9027 | 83.9238 | 15.4614 |
0.2551 | 2.03 | 3600 | 0.3720 | 84.504 | 76.6676 | 83.9957 | 84.0108 | 15.4801 |
0.1617 | 2.26 | 4000 | 0.3648 | 84.4385 | 76.3684 | 83.8585 | 83.8657 | 15.4897 |
0.1711 | 2.48 | 4400 | 0.3671 | 84.5241 | 76.6518 | 83.9862 | 83.9987 | 15.4902 |
0.1771 | 2.71 | 4800 | 0.3607 | 84.6437 | 76.6682 | 84.103 | 84.1174 | 15.4683 |
0.1803 | 2.93 | 5200 | 0.3582 | 84.479 | 76.6205 | 83.9509 | 83.9504 | 15.4715 |
0.1199 | 3.16 | 5600 | 0.3971 | 84.6367 | 76.7872 | 84.0191 | 84.0534 | 15.4715 |
0.1005 | 3.39 | 6000 | 0.4085 | 84.5153 | 76.6564 | 83.9365 | 83.9506 | 15.4820 |
0.1033 | 3.61 | 6400 | 0.4007 | 84.3191 | 76.399 | 83.8183 | 83.8142 | 15.4728 |
0.1067 | 3.84 | 6800 | 0.4014 | 84.5289 | 76.5335 | 83.9706 | 83.9967 | 15.4674 |
0.09 | 4.06 | 7200 | 0.4328 | 84.3978 | 76.6231 | 83.8654 | 83.8728 | 15.4783 |
0.0574 | 4.29 | 7600 | 0.4305 | 84.4476 | 76.7198 | 83.8943 | 83.9 | 15.4820 |
0.0579 | 4.51 | 8000 | 0.4510 | 84.5536 | 76.7635 | 83.977 | 83.9745 | 15.4719 |
0.061 | 4.74 | 8400 | 0.4447 | 84.5632 | 76.9892 | 84.0419 | 84.0501 | 15.4815 |
0.0608 | 4.97 | 8800 | 0.4353 | 84.6004 | 76.8883 | 84.0518 | 84.0596 | 15.4788 |
0.0362 | 5.19 | 9200 | 0.4853 | 84.7169 | 77.1321 | 84.1485 | 84.1486 | 15.4760 |
0.0333 | 5.42 | 9600 | 0.5053 | 84.851 | 77.4661 | 84.307 | 84.3106 | 15.4829 |
0.0325 | 5.64 | 10000 | 0.5066 | 84.7412 | 77.3031 | 84.2107 | 84.2006 | 15.4948 |
0.0335 | 5.87 | 10400 | 0.4947 | 84.7596 | 77.2636 | 84.2156 | 84.224 | 15.4906 |
0.0269 | 6.09 | 10800 | 0.5306 | 84.7484 | 77.2693 | 84.1824 | 84.1962 | 15.4811 |
0.0184 | 6.32 | 11200 | 0.5535 | 84.8066 | 77.3749 | 84.2765 | 84.2989 | 15.4756 |
0.0177 | 6.55 | 11600 | 0.5555 | 84.7335 | 77.2108 | 84.1917 | 84.2084 | 15.4865 |
0.0168 | 6.77 | 12000 | 0.5538 | 84.7053 | 77.2902 | 84.184 | 84.1929 | 15.4792 |
0.0165 | 7.0 | 12400 | 0.5614 | 84.7332 | 77.3098 | 84.2055 | 84.2055 | 15.4879 |
0.0092 | 7.22 | 12800 | 0.6222 | 84.7668 | 77.3059 | 84.2235 | 84.2397 | 15.4724 |
0.0086 | 7.45 | 13200 | 0.6485 | 84.8211 | 77.4247 | 84.2857 | 84.2996 | 15.4751 |
0.0098 | 7.67 | 13600 | 0.6417 | 84.7854 | 77.4226 | 84.2457 | 84.2652 | 15.4865 |
0.0088 | 7.9 | 14000 | 0.6445 | 84.7809 | 77.4171 | 84.2396 | 84.2591 | 15.4852 |
Framework versions
- Transformers 4.28.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.3