<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mt5-base-gecid-e8-b8
This model is a fine-tuned version of google/mt5-base on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.3149
- Rouge1: 63.7794
- Rouge2: 56.8052
- Rougel: 63.71
- Rougelsum: 63.7021
- Gen Len: 18.7580
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
1.4284 | 0.25 | 221 | 0.5336 | 59.2266 | 49.3982 | 59.1131 | 59.1192 | 18.7516 |
0.6794 | 0.5 | 442 | 0.5026 | 60.2965 | 51.1698 | 60.2314 | 60.2144 | 18.7550 |
0.6054 | 0.75 | 663 | 0.4443 | 61.5746 | 53.2755 | 61.5027 | 61.4981 | 18.7439 |
0.5234 | 1.0 | 884 | 0.4024 | 62.0849 | 54.0043 | 61.98 | 61.9795 | 18.7571 |
0.4023 | 1.25 | 1105 | 0.3876 | 62.3763 | 54.516 | 62.2915 | 62.2758 | 18.7546 |
0.3615 | 1.5 | 1326 | 0.3500 | 62.8309 | 55.2966 | 62.767 | 62.7424 | 18.7550 |
0.3464 | 1.75 | 1547 | 0.3758 | 62.9962 | 55.4509 | 62.9335 | 62.9322 | 18.7499 |
0.3295 | 2.0 | 1768 | 0.3810 | 62.9764 | 55.2429 | 62.8254 | 62.8139 | 18.7524 |
0.234 | 2.25 | 1989 | 0.3433 | 63.4401 | 56.2959 | 63.3721 | 63.3559 | 18.7622 |
0.2138 | 2.5 | 2210 | 0.3558 | 63.5914 | 56.5334 | 63.5275 | 63.5203 | 18.7567 |
0.2153 | 2.75 | 2431 | 0.3149 | 63.7794 | 56.8052 | 63.71 | 63.7021 | 18.7580 |
0.2211 | 3.0 | 2652 | 0.3202 | 64.0491 | 57.399 | 63.9913 | 63.9872 | 18.7584 |
0.137 | 3.25 | 2873 | 0.3409 | 64.0702 | 57.3436 | 63.9913 | 63.9988 | 18.7537 |
0.1374 | 3.5 | 3094 | 0.3272 | 64.1621 | 57.4838 | 64.0817 | 64.0887 | 18.7486 |
0.1419 | 3.75 | 3315 | 0.3237 | 64.3536 | 57.7776 | 64.2516 | 64.2575 | 18.7478 |
0.1385 | 4.0 | 3536 | 0.3278 | 64.4918 | 58.0381 | 64.4171 | 64.4095 | 18.7567 |
0.0843 | 4.25 | 3757 | 0.3524 | 64.4279 | 57.9151 | 64.3466 | 64.3437 | 18.7503 |
0.0869 | 4.5 | 3978 | 0.3484 | 64.6526 | 58.2318 | 64.5719 | 64.5511 | 18.7507 |
0.0852 | 4.75 | 4199 | 0.3375 | 64.6296 | 58.3682 | 64.5767 | 64.5826 | 18.7563 |
0.0858 | 5.0 | 4420 | 0.3332 | 64.5979 | 58.3381 | 64.5341 | 64.5327 | 18.7524 |
0.0546 | 5.25 | 4641 | 0.3699 | 64.7273 | 58.6031 | 64.6666 | 64.6717 | 18.7588 |
0.0543 | 5.5 | 4862 | 0.3625 | 64.902 | 58.8353 | 64.8386 | 64.8482 | 18.7588 |
0.0514 | 5.75 | 5083 | 0.3566 | 64.7232 | 58.5502 | 64.6608 | 64.6557 | 18.7580 |
0.0542 | 6.0 | 5304 | 0.3562 | 64.8116 | 58.7128 | 64.7289 | 64.7468 | 18.7592 |
0.0301 | 6.25 | 5525 | 0.3975 | 64.8613 | 58.7115 | 64.7799 | 64.7906 | 18.7575 |
0.0344 | 6.5 | 5746 | 0.3938 | 64.9673 | 58.8576 | 64.8817 | 64.8935 | 18.7575 |
0.0319 | 6.75 | 5967 | 0.3867 | 64.918 | 58.8365 | 64.8385 | 64.8376 | 18.7597 |
0.0315 | 7.0 | 6188 | 0.3793 | 65.0065 | 58.9789 | 64.9387 | 64.9456 | 18.7588 |
0.0234 | 7.25 | 6409 | 0.3983 | 64.9348 | 58.9178 | 64.8671 | 64.8776 | 18.7580 |
0.022 | 7.5 | 6630 | 0.4099 | 65.0139 | 59.0427 | 64.9463 | 64.9453 | 18.7588 |
0.0183 | 7.75 | 6851 | 0.4187 | 64.9696 | 58.9453 | 64.9022 | 64.9074 | 18.7601 |
0.0206 | 8.0 | 7072 | 0.4181 | 65.0 | 58.9921 | 64.9346 | 64.94 | 18.7605 |
Framework versions
- Transformers 4.28.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.2