<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mt5-large-gecfirst-e8-b16
This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.2672
- Rouge1: 64.1391
- Rouge2: 56.9117
- Rougel: 64.0719
- Rougelsum: 64.1665
- Gen Len: 18.7753
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
1.8204 | 0.25 | 74 | 0.4021 | 61.4087 | 52.3887 | 61.2674 | 61.3674 | 18.7804 |
0.7246 | 0.5 | 148 | 0.3252 | 63.347 | 55.3862 | 63.1874 | 63.2961 | 18.7652 |
0.6142 | 0.75 | 222 | 0.3028 | 63.725 | 56.2856 | 63.5597 | 63.6491 | 18.7838 |
0.5472 | 1.0 | 296 | 0.2919 | 63.8647 | 56.6097 | 63.7525 | 63.8544 | 18.7973 |
0.3687 | 1.25 | 370 | 0.2777 | 64.0686 | 56.686 | 63.883 | 63.9804 | 18.7703 |
0.3907 | 1.49 | 444 | 0.2870 | 64.0517 | 56.6668 | 63.9062 | 64.0017 | 18.7838 |
0.3466 | 1.74 | 518 | 0.2726 | 64.2559 | 57.4463 | 64.1045 | 64.2199 | 18.7770 |
0.3341 | 1.99 | 592 | 0.2672 | 64.1391 | 56.9117 | 64.0719 | 64.1665 | 18.7753 |
0.2036 | 2.24 | 666 | 0.2834 | 64.5476 | 57.8246 | 64.3771 | 64.5255 | 18.7804 |
0.2091 | 2.49 | 740 | 0.2897 | 64.1422 | 56.9715 | 64.0481 | 64.1689 | 18.7432 |
0.2002 | 2.74 | 814 | 0.2703 | 64.6648 | 57.707 | 64.4805 | 64.5948 | 18.7804 |
0.204 | 2.99 | 888 | 0.2824 | 64.0966 | 56.9705 | 63.9888 | 64.073 | 18.7551 |
0.1185 | 3.24 | 962 | 0.3022 | 64.4346 | 57.6011 | 64.3542 | 64.4615 | 18.7939 |
0.117 | 3.49 | 1036 | 0.2870 | 64.455 | 57.3607 | 64.2925 | 64.3963 | 18.7669 |
0.1135 | 3.74 | 1110 | 0.2890 | 64.7671 | 58.0409 | 64.5938 | 64.6987 | 18.7669 |
0.1175 | 3.99 | 1184 | 0.2977 | 64.8082 | 58.0379 | 64.6993 | 64.7849 | 18.7652 |
0.0726 | 4.24 | 1258 | 0.3135 | 64.5297 | 57.6752 | 64.4134 | 64.5109 | 18.7736 |
0.0654 | 4.48 | 1332 | 0.3298 | 64.5051 | 57.6982 | 64.3561 | 64.4885 | 18.7787 |
0.0719 | 4.73 | 1406 | 0.3139 | 64.8793 | 58.1936 | 64.749 | 64.8532 | 18.7720 |
0.0665 | 4.98 | 1480 | 0.3174 | 64.9015 | 58.1975 | 64.786 | 64.907 | 18.7703 |
0.0452 | 5.23 | 1554 | 0.3272 | 64.5715 | 58.067 | 64.4336 | 64.5425 | 18.7889 |
0.0395 | 5.48 | 1628 | 0.3337 | 64.7712 | 58.1058 | 64.6351 | 64.7423 | 18.7703 |
0.0367 | 5.73 | 1702 | 0.3422 | 64.9298 | 58.4592 | 64.8188 | 64.8927 | 18.7787 |
0.0393 | 5.98 | 1776 | 0.3394 | 64.8953 | 58.162 | 64.7892 | 64.8822 | 18.7787 |
0.0247 | 6.23 | 1850 | 0.3532 | 64.9207 | 58.2827 | 64.8053 | 64.8903 | 18.7872 |
0.0222 | 6.48 | 1924 | 0.3543 | 64.902 | 58.3086 | 64.793 | 64.8973 | 18.7736 |
0.0203 | 6.73 | 1998 | 0.3628 | 65.1022 | 58.7138 | 64.9734 | 65.0891 | 18.7720 |
0.0218 | 6.98 | 2072 | 0.3599 | 64.9409 | 58.387 | 64.7925 | 64.9157 | 18.7720 |
0.0156 | 7.23 | 2146 | 0.3802 | 65.1242 | 58.8116 | 64.9962 | 65.1097 | 18.7736 |
0.013 | 7.47 | 2220 | 0.3845 | 64.9358 | 58.4528 | 64.8099 | 64.925 | 18.7703 |
0.0114 | 7.72 | 2294 | 0.3913 | 64.9827 | 58.6449 | 64.863 | 64.9661 | 18.7720 |
0.0125 | 7.97 | 2368 | 0.3886 | 65.0031 | 58.5507 | 64.8805 | 64.9845 | 18.7720 |
Framework versions
- Transformers 4.28.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.3