<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mt5-base-gecfirst-e8-b16
This model is a fine-tuned version of google/mt5-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3009
- Rouge1: 63.8499
- Rouge2: 56.2662
- Rougel: 63.73
- Rougelsum: 63.6591
- Gen Len: 18.7736
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
3.409 | 0.25 | 74 | 0.6899 | 58.0459 | 46.7233 | 57.9944 | 57.9576 | 18.7669 |
1.0497 | 0.5 | 148 | 0.4335 | 61.3353 | 51.8804 | 61.174 | 61.1541 | 18.7703 |
0.8355 | 0.75 | 222 | 0.3734 | 62.5279 | 54.5952 | 62.4436 | 62.4377 | 18.7720 |
0.7339 | 1.0 | 296 | 0.3814 | 62.8071 | 54.8468 | 62.7075 | 62.6933 | 18.7770 |
0.5946 | 1.25 | 370 | 0.3418 | 63.1523 | 55.3752 | 62.9987 | 62.9879 | 18.7770 |
0.5746 | 1.49 | 444 | 0.3234 | 62.9253 | 55.1955 | 62.821 | 62.7592 | 18.7905 |
0.5278 | 1.74 | 518 | 0.3252 | 63.3056 | 55.6505 | 63.1271 | 63.0661 | 18.7804 |
0.4886 | 1.99 | 592 | 0.3265 | 63.1652 | 55.0909 | 62.979 | 62.9613 | 18.7753 |
0.366 | 2.24 | 666 | 0.3126 | 63.8131 | 56.5685 | 63.7303 | 63.6682 | 18.7703 |
0.3553 | 2.49 | 740 | 0.3192 | 63.6195 | 55.9276 | 63.4796 | 63.4692 | 18.7703 |
0.3558 | 2.74 | 814 | 0.3009 | 63.8499 | 56.2662 | 63.73 | 63.6591 | 18.7736 |
0.353 | 2.99 | 888 | 0.3014 | 63.7417 | 56.241 | 63.6192 | 63.5985 | 18.7686 |
0.2398 | 3.24 | 962 | 0.3119 | 63.999 | 56.8854 | 63.88 | 63.8705 | 18.7804 |
0.2459 | 3.49 | 1036 | 0.3222 | 64.0299 | 56.5581 | 63.9247 | 63.8934 | 18.7686 |
0.2423 | 3.74 | 1110 | 0.3125 | 63.6601 | 56.1864 | 63.4956 | 63.4819 | 18.7686 |
0.243 | 3.99 | 1184 | 0.3174 | 63.6676 | 56.1724 | 63.5183 | 63.4947 | 18.7736 |
0.1696 | 4.24 | 1258 | 0.3353 | 63.9905 | 56.3781 | 63.7979 | 63.7802 | 18.7652 |
0.1643 | 4.48 | 1332 | 0.3386 | 64.0219 | 56.7311 | 63.8823 | 63.8654 | 18.7703 |
0.1728 | 4.73 | 1406 | 0.3306 | 64.0261 | 56.7331 | 63.8978 | 63.8731 | 18.7720 |
0.1657 | 4.98 | 1480 | 0.3269 | 63.9735 | 56.4556 | 63.8514 | 63.8168 | 18.7703 |
0.1186 | 5.23 | 1554 | 0.3390 | 63.9831 | 56.6624 | 63.8953 | 63.8717 | 18.7703 |
0.1129 | 5.48 | 1628 | 0.3521 | 63.8674 | 56.528 | 63.7626 | 63.7362 | 18.7770 |
0.1061 | 5.73 | 1702 | 0.3539 | 63.9886 | 56.5753 | 63.881 | 63.8615 | 18.7703 |
0.1179 | 5.98 | 1776 | 0.3490 | 63.9949 | 56.7369 | 63.8929 | 63.8516 | 18.7736 |
0.0793 | 6.23 | 1850 | 0.3704 | 64.1527 | 57.0111 | 64.0496 | 63.9953 | 18.7686 |
0.0779 | 6.48 | 1924 | 0.3723 | 64.1833 | 57.0654 | 64.0686 | 64.0317 | 18.7669 |
0.0827 | 6.73 | 1998 | 0.3663 | 64.2185 | 56.9382 | 64.1096 | 64.0743 | 18.7736 |
0.0807 | 6.98 | 2072 | 0.3691 | 64.2298 | 56.9752 | 64.0957 | 64.0777 | 18.7686 |
0.0633 | 7.23 | 2146 | 0.3865 | 64.4729 | 57.5503 | 64.3733 | 64.3509 | 18.7652 |
0.0603 | 7.47 | 2220 | 0.3919 | 64.3001 | 57.2684 | 64.1693 | 64.1391 | 18.7635 |
0.0565 | 7.72 | 2294 | 0.3946 | 64.4077 | 57.3413 | 64.2825 | 64.2491 | 18.7635 |
0.0583 | 7.97 | 2368 | 0.3923 | 64.4078 | 57.3672 | 64.2775 | 64.2367 | 18.7652 |
Framework versions
- Transformers 4.28.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.3