<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
find_first_sent_train_30_eval_10_flan-t5-xl
This model is a fine-tuned version of google/flan-t5-xl on the tyzhu/find_first_sent_train_30_eval_10 dataset. It achieves the following results on the evaluation set:
- Loss: 5.9172
- Bleu: 21.6414
- Gen Len: 38.2
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 100.0
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
No log | 1.0 | 5 | 2.5051 | 1.241 | 43.0 |
No log | 2.0 | 10 | 2.4681 | 0.8804 | 35.0 |
No log | 3.0 | 15 | 2.4068 | 2.5934 | 24.3 |
No log | 4.0 | 20 | 2.3519 | 1.9387 | 25.8 |
No log | 5.0 | 25 | 2.2914 | 2.182 | 28.1 |
No log | 6.0 | 30 | 2.2810 | 2.2794 | 28.8 |
No log | 7.0 | 35 | 2.2634 | 3.0649 | 30.2 |
No log | 8.0 | 40 | 2.3489 | 3.6397 | 35.5 |
No log | 9.0 | 45 | 2.4796 | 2.1984 | 34.8 |
2.0773 | 10.0 | 50 | 2.6235 | 2.3859 | 28.5 |
2.0773 | 11.0 | 55 | 2.7206 | 3.1068 | 29.3 |
2.0773 | 12.0 | 60 | 2.8255 | 2.9542 | 29.2 |
2.0773 | 13.0 | 65 | 3.3399 | 3.5205 | 28.6 |
2.0773 | 14.0 | 70 | 3.3261 | 2.6726 | 28.4 |
2.0773 | 15.0 | 75 | 4.0954 | 2.6483 | 28.7 |
2.0773 | 16.0 | 80 | 4.6468 | 1.5483 | 30.9 |
2.0773 | 17.0 | 85 | 4.1352 | 1.9426 | 32.5 |
2.0773 | 18.0 | 90 | 4.5193 | 2.0072 | 31.5 |
2.0773 | 19.0 | 95 | 5.0365 | 5.7223 | 35.4 |
0.4306 | 20.0 | 100 | 4.9830 | 6.0764 | 33.7 |
0.4306 | 21.0 | 105 | 5.1218 | 5.6436 | 35.2 |
0.4306 | 22.0 | 110 | 5.4091 | 5.3174 | 40.4 |
0.4306 | 23.0 | 115 | 5.3755 | 5.4611 | 38.8 |
0.4306 | 24.0 | 120 | 5.2219 | 5.9493 | 33.6 |
0.4306 | 25.0 | 125 | 5.2747 | 5.3679 | 36.9 |
0.4306 | 26.0 | 130 | 5.3279 | 5.0396 | 41.2 |
0.4306 | 27.0 | 135 | 5.4788 | 5.287 | 39.1 |
0.4306 | 28.0 | 140 | 5.5710 | 5.1812 | 40.4 |
0.4306 | 29.0 | 145 | 5.6488 | 5.3043 | 39.4 |
0.0867 | 30.0 | 150 | 5.5148 | 5.2983 | 37.9 |
0.0867 | 31.0 | 155 | 5.4655 | 20.8944 | 39.5 |
0.0867 | 32.0 | 160 | 5.6512 | 5.8527 | 34.4 |
0.0867 | 33.0 | 165 | 5.6764 | 15.876 | 50.8 |
0.0867 | 34.0 | 170 | 5.6538 | 15.876 | 50.8 |
0.0867 | 35.0 | 175 | 5.6921 | 20.4813 | 39.8 |
0.0867 | 36.0 | 180 | 5.6782 | 20.2634 | 40.8 |
0.0867 | 37.0 | 185 | 5.6104 | 21.0798 | 38.8 |
0.0867 | 38.0 | 190 | 5.3899 | 21.8155 | 37.8 |
0.0867 | 39.0 | 195 | 5.2651 | 21.8952 | 37.7 |
0.0376 | 40.0 | 200 | 5.4012 | 21.8952 | 37.7 |
0.0376 | 41.0 | 205 | 5.3592 | 21.8952 | 37.7 |
0.0376 | 42.0 | 210 | 5.2308 | 21.8952 | 37.7 |
0.0376 | 43.0 | 215 | 5.2728 | 21.3782 | 38.4 |
0.0376 | 44.0 | 220 | 5.3208 | 22.1008 | 37.0 |
0.0376 | 45.0 | 225 | 5.3982 | 21.8952 | 37.7 |
0.0376 | 46.0 | 230 | 5.3998 | 21.8952 | 37.7 |
0.0376 | 47.0 | 235 | 5.3946 | 21.7985 | 37.9 |
0.0376 | 48.0 | 240 | 5.5448 | 21.9756 | 37.8 |
0.0376 | 49.0 | 245 | 5.6623 | 21.9756 | 37.8 |
0.0248 | 50.0 | 250 | 5.6704 | 15.6207 | 52.4 |
0.0248 | 51.0 | 255 | 5.7137 | 15.6207 | 52.4 |
0.0248 | 52.0 | 260 | 5.7186 | 16.1671 | 49.9 |
0.0248 | 53.0 | 265 | 5.7098 | 16.0377 | 50.3 |
0.0248 | 54.0 | 270 | 5.6003 | 15.9103 | 50.6 |
0.0248 | 55.0 | 275 | 5.5697 | 15.9103 | 50.6 |
0.0248 | 56.0 | 280 | 5.5331 | 16.0377 | 50.2 |
0.0248 | 57.0 | 285 | 5.5400 | 15.8265 | 50.9 |
0.0248 | 58.0 | 290 | 5.6258 | 13.2365 | 61.1 |
0.0248 | 59.0 | 295 | 5.6516 | 13.2365 | 61.1 |
0.0147 | 60.0 | 300 | 5.6560 | 13.2073 | 61.6 |
0.0147 | 61.0 | 305 | 5.7258 | 13.1459 | 61.9 |
0.0147 | 62.0 | 310 | 5.7615 | 13.1459 | 61.9 |
0.0147 | 63.0 | 315 | 5.7989 | 13.1459 | 61.9 |
0.0147 | 64.0 | 320 | 5.8839 | 13.1459 | 61.9 |
0.0147 | 65.0 | 325 | 5.9621 | 13.1459 | 61.9 |
0.0147 | 66.0 | 330 | 6.0142 | 13.1459 | 61.9 |
0.0147 | 67.0 | 335 | 6.0231 | 13.1459 | 61.9 |
0.0147 | 68.0 | 340 | 5.9970 | 21.1381 | 38.6 |
0.0147 | 69.0 | 345 | 5.9133 | 21.1381 | 38.6 |
0.0107 | 70.0 | 350 | 5.8522 | 20.9916 | 39.2 |
0.0107 | 71.0 | 355 | 5.7963 | 20.9916 | 39.2 |
0.0107 | 72.0 | 360 | 5.7927 | 20.9916 | 39.2 |
0.0107 | 73.0 | 365 | 5.7878 | 20.9916 | 39.2 |
0.0107 | 74.0 | 370 | 5.7743 | 20.9916 | 39.2 |
0.0107 | 75.0 | 375 | 5.7927 | 20.9916 | 39.2 |
0.0107 | 76.0 | 380 | 5.8188 | 20.9916 | 39.2 |
0.0107 | 77.0 | 385 | 5.8431 | 20.9916 | 39.2 |
0.0107 | 78.0 | 390 | 5.8821 | 20.9916 | 39.2 |
0.0107 | 79.0 | 395 | 5.9117 | 20.9916 | 39.2 |
0.0089 | 80.0 | 400 | 5.9405 | 20.9916 | 39.2 |
0.0089 | 81.0 | 405 | 5.9583 | 21.6414 | 38.2 |
0.0089 | 82.0 | 410 | 5.9502 | 21.6414 | 38.2 |
0.0089 | 83.0 | 415 | 5.9410 | 21.6414 | 38.2 |
0.0089 | 84.0 | 420 | 5.9362 | 21.6414 | 38.2 |
0.0089 | 85.0 | 425 | 5.9252 | 21.6414 | 38.2 |
0.0089 | 86.0 | 430 | 5.9187 | 21.6414 | 38.2 |
0.0089 | 87.0 | 435 | 5.9201 | 21.6414 | 38.2 |
0.0089 | 88.0 | 440 | 5.9235 | 21.6414 | 38.2 |
0.0089 | 89.0 | 445 | 5.9023 | 21.6414 | 38.2 |
0.0074 | 90.0 | 450 | 5.8876 | 21.6414 | 38.2 |
0.0074 | 91.0 | 455 | 5.8896 | 21.6414 | 38.2 |
0.0074 | 92.0 | 460 | 5.8949 | 21.6414 | 38.2 |
0.0074 | 93.0 | 465 | 5.8910 | 21.6414 | 38.2 |
0.0074 | 94.0 | 470 | 5.8899 | 21.6414 | 38.2 |
0.0074 | 95.0 | 475 | 5.8902 | 21.6414 | 38.2 |
0.0074 | 96.0 | 480 | 5.8955 | 21.6414 | 38.2 |
0.0074 | 97.0 | 485 | 5.9038 | 21.6414 | 38.2 |
0.0074 | 98.0 | 490 | 5.9107 | 21.6414 | 38.2 |
0.0074 | 99.0 | 495 | 5.9156 | 21.6414 | 38.2 |
0.0067 | 100.0 | 500 | 5.9172 | 21.6414 | 38.2 |
Framework versions
- Transformers 4.34.0
- Pytorch 2.1.0+cu121
- Datasets 2.14.5
- Tokenizers 0.14.1