<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
flan-t5-base-fce-e8-b16
This model is a fine-tuned version of google/flan-t5-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3114
- Rouge1: 86.9035
- Rouge2: 79.2645
- Rougel: 86.4197
- Rougelsum: 86.4231
- Gen Len: 14.8850
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
0.4128 | 0.23 | 400 | 0.3457 | 86.8983 | 79.1632 | 86.3755 | 86.3944 | 14.8435 |
0.3783 | 0.45 | 800 | 0.3469 | 86.8995 | 78.8428 | 86.3368 | 86.3283 | 14.8955 |
0.3627 | 0.68 | 1200 | 0.3114 | 86.9035 | 79.2645 | 86.4197 | 86.4231 | 14.8850 |
0.3484 | 0.9 | 1600 | 0.3239 | 87.2292 | 79.8056 | 86.7218 | 86.7237 | 14.8759 |
0.2696 | 1.13 | 2000 | 0.3419 | 87.15 | 79.6016 | 86.6082 | 86.6241 | 14.8959 |
0.22 | 1.35 | 2400 | 0.3270 | 87.0232 | 79.4806 | 86.5137 | 86.5173 | 14.8868 |
0.2327 | 1.58 | 2800 | 0.3185 | 87.1028 | 79.6758 | 86.5985 | 86.6221 | 14.9005 |
0.2354 | 1.81 | 3200 | 0.3125 | 87.143 | 79.786 | 86.6545 | 86.6788 | 14.9010 |
0.2177 | 2.03 | 3600 | 0.3292 | 87.0858 | 79.5707 | 86.5451 | 86.5456 | 14.9133 |
0.1347 | 2.26 | 4000 | 0.3342 | 87.1768 | 79.9161 | 86.6402 | 86.6666 | 14.9142 |
0.1411 | 2.48 | 4400 | 0.3456 | 87.1049 | 79.9438 | 86.6152 | 86.6265 | 14.9110 |
0.1487 | 2.71 | 4800 | 0.3393 | 86.5182 | 78.468 | 86.0005 | 86.0283 | 14.8813 |
0.1498 | 2.93 | 5200 | 0.3347 | 87.2024 | 79.7098 | 86.6782 | 86.6904 | 14.8859 |
0.1055 | 3.16 | 5600 | 0.4027 | 87.1281 | 79.799 | 86.5714 | 86.5965 | 14.9105 |
0.0862 | 3.39 | 6000 | 0.4046 | 87.2721 | 79.8755 | 86.6838 | 86.6956 | 14.9073 |
0.0894 | 3.61 | 6400 | 0.3776 | 87.1508 | 79.865 | 86.6178 | 86.6424 | 14.8946 |
0.0942 | 3.84 | 6800 | 0.3781 | 87.2854 | 80.0876 | 86.7694 | 86.7867 | 14.8927 |
0.0816 | 4.06 | 7200 | 0.4300 | 87.3854 | 80.1162 | 86.8398 | 86.8446 | 14.8978 |
0.0582 | 4.29 | 7600 | 0.4201 | 87.2594 | 80.1824 | 86.7653 | 86.7807 | 14.9019 |
0.0588 | 4.51 | 8000 | 0.4129 | 87.3373 | 80.1802 | 86.8332 | 86.8414 | 14.9014 |
0.0571 | 4.74 | 8400 | 0.4437 | 87.2985 | 80.0215 | 86.8171 | 86.8238 | 14.8946 |
0.0587 | 4.97 | 8800 | 0.4019 | 87.2321 | 80.0933 | 86.6888 | 86.6931 | 14.9105 |
0.0381 | 5.19 | 9200 | 0.4822 | 87.2798 | 80.1822 | 86.7799 | 86.7886 | 14.9014 |
0.0378 | 5.42 | 9600 | 0.4831 | 87.409 | 80.3418 | 86.8845 | 86.8844 | 14.8927 |
0.0368 | 5.64 | 10000 | 0.4809 | 87.2276 | 79.9415 | 86.6776 | 86.6833 | 14.9105 |
0.0359 | 5.87 | 10400 | 0.4964 | 87.2916 | 80.1468 | 86.7693 | 86.7704 | 14.9028 |
0.0311 | 6.09 | 10800 | 0.5266 | 87.3443 | 80.1762 | 86.7852 | 86.7825 | 14.8991 |
0.0225 | 6.32 | 11200 | 0.5550 | 87.3142 | 80.2689 | 86.7856 | 86.7884 | 14.9037 |
0.0239 | 6.55 | 11600 | 0.5308 | 87.4003 | 80.2637 | 86.8373 | 86.8356 | 14.9023 |
0.0236 | 6.77 | 12000 | 0.5490 | 87.3865 | 80.3184 | 86.8563 | 86.8626 | 14.9037 |
0.0223 | 7.0 | 12400 | 0.5454 | 87.3842 | 80.2875 | 86.8109 | 86.8293 | 14.9055 |
0.0164 | 7.22 | 12800 | 0.5818 | 87.4641 | 80.3669 | 86.8908 | 86.9062 | 14.8964 |
0.0155 | 7.45 | 13200 | 0.5927 | 87.4191 | 80.3356 | 86.8541 | 86.8718 | 14.9014 |
0.0152 | 7.67 | 13600 | 0.5990 | 87.4257 | 80.2974 | 86.8481 | 86.8589 | 14.9005 |
0.0144 | 7.9 | 14000 | 0.6084 | 87.4754 | 80.3558 | 86.9086 | 86.9184 | 14.9014 |
Framework versions
- Transformers 4.28.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.3