<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
flan-t5-large-gecfirst-e8-b16
This model is a fine-tuned version of google/flan-t5-large on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.2131
- Rouge1: 41.9978
- Rouge2: 33.9626
- Rougel: 41.9731
- Rougelsum: 41.9472
- Gen Len: 18.9831
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
0.9076 | 0.25 | 74 | 0.3818 | 38.6461 | 28.3284 | 38.6118 | 38.6109 | 18.9882 |
0.6017 | 0.5 | 148 | 0.3263 | 39.3099 | 29.8882 | 39.2727 | 39.2355 | 18.9865 |
0.5157 | 0.75 | 222 | 0.2837 | 40.6583 | 31.7481 | 40.5873 | 40.5867 | 18.9814 |
0.474 | 1.0 | 296 | 0.2541 | 40.6455 | 32.0335 | 40.6159 | 40.6058 | 18.9797 |
0.368 | 1.25 | 370 | 0.2524 | 41.3508 | 33.0248 | 41.3263 | 41.3167 | 18.9831 |
0.3725 | 1.49 | 444 | 0.2410 | 41.0031 | 32.5976 | 40.9528 | 40.9101 | 18.9831 |
0.3393 | 1.74 | 518 | 0.2269 | 41.734 | 33.6127 | 41.6234 | 41.661 | 18.9814 |
0.324 | 1.99 | 592 | 0.2317 | 41.6178 | 33.366 | 41.5426 | 41.5451 | 18.9797 |
0.2408 | 2.24 | 666 | 0.2273 | 41.9924 | 33.9113 | 41.9396 | 41.9476 | 18.9814 |
0.2346 | 2.49 | 740 | 0.2176 | 41.8074 | 33.8465 | 41.7723 | 41.7715 | 18.9747 |
0.2333 | 2.74 | 814 | 0.2131 | 41.9978 | 33.9626 | 41.9731 | 41.9472 | 18.9831 |
0.2355 | 2.99 | 888 | 0.2147 | 42.2681 | 34.3893 | 42.2337 | 42.2555 | 18.9780 |
0.1538 | 3.24 | 962 | 0.2264 | 42.3555 | 34.5818 | 42.2803 | 42.3056 | 18.9814 |
0.1575 | 3.49 | 1036 | 0.2361 | 42.1416 | 34.3667 | 42.0673 | 42.0882 | 18.9831 |
0.1498 | 3.74 | 1110 | 0.2272 | 42.3466 | 34.5508 | 42.3404 | 42.3167 | 18.9780 |
0.1549 | 3.99 | 1184 | 0.2255 | 42.3685 | 34.5886 | 42.3499 | 42.3207 | 18.9814 |
0.093 | 4.24 | 1258 | 0.2506 | 42.0742 | 34.0045 | 42.0614 | 42.0466 | 18.9797 |
0.09 | 4.48 | 1332 | 0.2505 | 42.1821 | 34.2779 | 42.1376 | 42.1456 | 18.9831 |
0.0953 | 4.73 | 1406 | 0.2421 | 42.2677 | 34.4361 | 42.2061 | 42.1947 | 18.9831 |
0.0932 | 4.98 | 1480 | 0.2473 | 42.3455 | 34.5756 | 42.3047 | 42.3066 | 18.9814 |
0.057 | 5.23 | 1554 | 0.2777 | 42.4729 | 34.6931 | 42.4424 | 42.4291 | 18.9814 |
0.0508 | 5.48 | 1628 | 0.2710 | 42.3497 | 34.4107 | 42.3168 | 42.2812 | 18.9780 |
0.0517 | 5.73 | 1702 | 0.2779 | 42.3832 | 34.661 | 42.3475 | 42.3433 | 18.9814 |
0.0519 | 5.98 | 1776 | 0.2872 | 42.3477 | 34.5264 | 42.3043 | 42.28 | 18.9814 |
0.0293 | 6.23 | 1850 | 0.3445 | 42.1247 | 34.3073 | 42.0531 | 42.0379 | 18.9814 |
0.0308 | 6.48 | 1924 | 0.3082 | 42.1917 | 34.23 | 42.1624 | 42.1277 | 18.9831 |
0.0306 | 6.73 | 1998 | 0.3138 | 42.2558 | 34.2683 | 42.2084 | 42.1848 | 18.9831 |
0.0281 | 6.98 | 2072 | 0.3134 | 42.5691 | 34.8936 | 42.5276 | 42.5056 | 18.9831 |
0.0207 | 7.23 | 2146 | 0.3310 | 42.4715 | 34.7943 | 42.3914 | 42.3853 | 18.9831 |
0.0187 | 7.47 | 2220 | 0.3361 | 42.4191 | 34.8391 | 42.3622 | 42.3593 | 18.9814 |
0.0172 | 7.72 | 2294 | 0.3414 | 42.5882 | 34.9307 | 42.5291 | 42.5049 | 18.9814 |
0.0168 | 7.97 | 2368 | 0.3413 | 42.5681 | 34.9092 | 42.4893 | 42.4794 | 18.9814 |
Framework versions
- Transformers 4.28.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.3