<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
synpre_mix_v3_1M_t5-small
This model is a fine-tuned version of t5-small on the tyzhu/synpre_mix_v3_1M dataset. It achieves the following results on the evaluation set:
- Loss: 0.1201
- Bleu: 93.5646
- Gen Len: 87.5554
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 256
- eval_batch_size: 256
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: inverse_sqrt
- lr_scheduler_warmup_steps: 10000
- training_steps: 200000
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
8.1288 | 1.28 | 5000 | 8.2984 | 1.092 | 154.6861 |
7.7066 | 2.56 | 10000 | 7.8456 | 2.2572 | 120.3898 |
7.3474 | 3.84 | 15000 | 7.3513 | 4.6007 | 86.3161 |
6.5043 | 5.12 | 20000 | 6.2626 | 7.884 | 84.2931 |
2.0667 | 6.4 | 25000 | 2.5516 | 51.5595 | 104.0468 |
1.0434 | 7.68 | 30000 | 1.2005 | 79.5292 | 90.2053 |
0.7833 | 8.96 | 35000 | 0.8932 | 86.2323 | 85.2567 |
0.5221 | 10.24 | 40000 | 0.5357 | 83.7473 | 89.9205 |
0.4006 | 11.52 | 45000 | 0.4025 | 86.6179 | 88.674 |
0.3183 | 12.8 | 50000 | 0.3176 | 88.9508 | 87.5098 |
0.2735 | 14.08 | 55000 | 0.2720 | 88.7724 | 88.3669 |
0.2452 | 15.36 | 60000 | 0.2520 | 89.0626 | 88.4114 |
0.2142 | 16.64 | 65000 | 0.2355 | 91.2709 | 86.769 |
0.1888 | 17.92 | 70000 | 0.2139 | 91.2543 | 87.46 |
0.1757 | 19.2 | 75000 | 0.2058 | 91.7017 | 87.3256 |
0.1616 | 20.48 | 80000 | 0.2004 | 91.6796 | 87.2561 |
0.1562 | 21.76 | 85000 | 0.1837 | 92.2346 | 87.3002 |
0.1407 | 23.04 | 90000 | 0.1733 | 92.1041 | 87.9509 |
0.1356 | 24.32 | 95000 | 0.1715 | 93.4019 | 86.5713 |
0.1295 | 25.6 | 100000 | 0.1570 | 93.7442 | 86.7566 |
0.127 | 26.87 | 105000 | 0.1649 | 93.0466 | 87.1686 |
0.117 | 28.15 | 110000 | 0.1528 | 92.9589 | 87.5743 |
0.1152 | 29.43 | 115000 | 0.1499 | 93.7713 | 86.9094 |
0.1116 | 30.71 | 120000 | 0.1514 | 92.8724 | 87.6156 |
0.1067 | 31.99 | 125000 | 0.1432 | 92.7475 | 87.8559 |
0.1041 | 33.27 | 130000 | 0.1409 | 93.6111 | 87.2048 |
0.1001 | 34.55 | 135000 | 0.1430 | 92.8654 | 87.6548 |
0.0965 | 35.83 | 140000 | 0.1329 | 94.0062 | 87.1653 |
0.0949 | 37.11 | 145000 | 0.1313 | 94.3514 | 86.6624 |
0.0941 | 38.39 | 150000 | 0.1305 | 93.802 | 87.2185 |
0.0902 | 39.67 | 155000 | 0.1249 | 94.2611 | 86.9682 |
0.0906 | 40.95 | 160000 | 0.1251 | 94.1046 | 87.0009 |
0.0869 | 42.23 | 165000 | 0.1230 | 93.9196 | 87.1969 |
0.0849 | 43.51 | 170000 | 0.1279 | 94.1902 | 86.9505 |
0.0843 | 44.79 | 175000 | 0.1218 | 93.7524 | 87.3351 |
0.0769 | 46.07 | 180000 | 0.1191 | 93.8624 | 87.3325 |
0.078 | 47.35 | 185000 | 0.1139 | 94.7611 | 86.7778 |
0.0774 | 48.63 | 190000 | 0.1237 | 93.1841 | 87.7449 |
0.0786 | 49.91 | 195000 | 0.1135 | 94.3655 | 87.0559 |
0.0736 | 51.19 | 200000 | 0.1201 | 93.5646 | 87.5554 |
Framework versions
- Transformers 4.34.0
- Pytorch 2.1.0+cu121
- Datasets 2.14.5
- Tokenizers 0.14.1