<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
flan-t5-large-fce-e8-b16
This model is a fine-tuned version of google/flan-t5-large on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3123
- Rouge1: 87.0781
- Rouge2: 79.8175
- Rougel: 86.6213
- Rougelsum: 86.6385
- Gen Len: 14.8832
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
0.3761 | 0.23 | 400 | 0.3325 | 86.7053 | 79.1321 | 86.1593 | 86.1958 | 14.8864 |
0.3516 | 0.45 | 800 | 0.3201 | 86.8076 | 79.1282 | 86.2981 | 86.3209 | 14.8781 |
0.3401 | 0.68 | 1200 | 0.3187 | 86.479 | 78.6505 | 85.9172 | 85.9585 | 14.8800 |
0.3283 | 0.9 | 1600 | 0.3123 | 87.0781 | 79.8175 | 86.6213 | 86.6385 | 14.8832 |
0.2395 | 1.13 | 2000 | 0.3278 | 86.7979 | 79.3766 | 86.3314 | 86.3581 | 14.9046 |
0.1817 | 1.35 | 2400 | 0.3170 | 86.8343 | 79.4019 | 86.3148 | 86.3232 | 14.8964 |
0.1962 | 1.58 | 2800 | 0.3138 | 86.8702 | 79.425 | 86.3412 | 86.36 | 14.9069 |
0.1971 | 1.81 | 3200 | 0.3191 | 86.8355 | 79.3178 | 86.2974 | 86.322 | 14.8809 |
0.1816 | 2.03 | 3600 | 0.3490 | 87.0986 | 79.7312 | 86.6108 | 86.6227 | 14.9142 |
0.0975 | 2.26 | 4000 | 0.3534 | 86.7684 | 79.3649 | 86.2755 | 86.2885 | 14.9069 |
0.1033 | 2.48 | 4400 | 0.3536 | 86.8978 | 79.714 | 86.4135 | 86.435 | 14.9302 |
0.1086 | 2.71 | 4800 | 0.3553 | 86.6286 | 79.3293 | 86.1381 | 86.1686 | 14.9078 |
0.1141 | 2.93 | 5200 | 0.3530 | 86.8452 | 79.4178 | 86.2927 | 86.3239 | 14.9010 |
0.076 | 3.16 | 5600 | 0.4088 | 86.992 | 79.8179 | 86.5124 | 86.5186 | 14.9096 |
0.0595 | 3.39 | 6000 | 0.4052 | 86.8874 | 79.6302 | 86.3643 | 86.3784 | 14.9101 |
0.0606 | 3.61 | 6400 | 0.4051 | 86.9236 | 79.5305 | 86.3715 | 86.3959 | 14.9101 |
0.0653 | 3.84 | 6800 | 0.3860 | 86.8353 | 79.541 | 86.3249 | 86.3292 | 14.9165 |
0.0553 | 4.06 | 7200 | 0.4229 | 86.7788 | 79.5444 | 86.3393 | 86.3468 | 14.8868 |
0.0339 | 4.29 | 7600 | 0.4478 | 86.6863 | 79.5215 | 86.216 | 86.2363 | 14.9133 |
0.0375 | 4.51 | 8000 | 0.4359 | 86.8412 | 79.668 | 86.3237 | 86.3349 | 14.9229 |
0.0376 | 4.74 | 8400 | 0.4459 | 86.8836 | 79.682 | 86.3993 | 86.4062 | 14.9069 |
0.0372 | 4.97 | 8800 | 0.4324 | 86.6833 | 79.5114 | 86.1856 | 86.2031 | 14.9197 |
0.023 | 5.19 | 9200 | 0.4930 | 86.9595 | 79.8244 | 86.4103 | 86.4373 | 14.9279 |
0.0211 | 5.42 | 9600 | 0.4927 | 87.0212 | 79.8707 | 86.5054 | 86.5117 | 14.9320 |
0.0215 | 5.64 | 10000 | 0.4915 | 86.9495 | 79.8479 | 86.458 | 86.4632 | 14.9115 |
0.0205 | 5.87 | 10400 | 0.4919 | 86.8966 | 79.7666 | 86.424 | 86.4482 | 14.9069 |
0.0169 | 6.09 | 10800 | 0.5415 | 87.1119 | 80.0504 | 86.6205 | 86.6255 | 14.9083 |
0.0116 | 6.32 | 11200 | 0.5767 | 87.1828 | 80.2547 | 86.6809 | 86.6742 | 14.9215 |
0.0113 | 6.55 | 11600 | 0.5799 | 87.2494 | 80.2853 | 86.7412 | 86.761 | 14.9147 |
0.0103 | 6.77 | 12000 | 0.6036 | 87.1081 | 80.1873 | 86.6086 | 86.6176 | 14.9251 |
0.0106 | 7.0 | 12400 | 0.5821 | 87.1489 | 80.1987 | 86.654 | 86.6694 | 14.9242 |
0.0064 | 7.22 | 12800 | 0.6325 | 87.2026 | 80.2043 | 86.6988 | 86.704 | 14.9197 |
0.0056 | 7.45 | 13200 | 0.6878 | 87.184 | 80.1382 | 86.6798 | 86.7049 | 14.9188 |
0.0061 | 7.67 | 13600 | 0.6888 | 87.2465 | 80.1602 | 86.7407 | 86.7459 | 14.9201 |
0.0057 | 7.9 | 14000 | 0.6922 | 87.2584 | 80.2614 | 86.7806 | 86.7948 | 14.9201 |
Framework versions
- Transformers 4.28.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.3