<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
synpre_mix_v4_1M_t5-small
This model is a fine-tuned version of t5-small on the tyzhu/synpre_mix_v4_1M dataset. It achieves the following results on the evaluation set:
- Loss: 0.0127
- Bleu: 99.0523
- Gen Len: 95.7202
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 128
- eval_batch_size: 128
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_steps: 10000
- training_steps: 200000
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
8.0211 | 0.64 | 5000 | 8.1585 | 0.9022 | 204.2924 |
7.6489 | 1.28 | 10000 | 7.7641 | 2.0919 | 121.8379 |
7.4637 | 1.92 | 15000 | 7.4472 | 4.4016 | 102.293 |
6.5429 | 2.56 | 20000 | 6.2282 | 7.2893 | 93.5449 |
0.5911 | 3.2 | 25000 | 0.3044 | 73.7013 | 109.0918 |
0.1765 | 3.84 | 30000 | 0.0935 | 87.9065 | 102.312 |
0.108 | 4.48 | 35000 | 0.0717 | 91.8023 | 99.7126 |
0.0833 | 5.12 | 40000 | 0.0562 | 95.3664 | 97.5387 |
0.0668 | 5.76 | 45000 | 0.0512 | 96.8384 | 96.3174 |
0.0542 | 6.4 | 50000 | 0.0390 | 96.7816 | 96.9863 |
0.0445 | 7.04 | 55000 | 0.0358 | 97.0231 | 96.9106 |
0.0424 | 7.68 | 60000 | 0.0305 | 96.7836 | 97.0555 |
0.035 | 8.32 | 65000 | 0.0295 | 96.4945 | 97.3141 |
0.0334 | 8.96 | 70000 | 0.0260 | 97.4163 | 96.7796 |
0.0294 | 9.6 | 75000 | 0.0275 | 96.2287 | 97.5309 |
0.0276 | 10.24 | 80000 | 0.0235 | 97.3636 | 96.8039 |
0.0255 | 10.88 | 85000 | 0.0220 | 97.5185 | 96.757 |
0.0226 | 11.52 | 90000 | 0.0204 | 97.877 | 96.4196 |
0.0214 | 12.16 | 95000 | 0.0196 | 97.8447 | 96.4615 |
0.0215 | 12.8 | 100000 | 0.0310 | 96.0399 | 97.5621 |
0.0197 | 13.44 | 105000 | 0.0178 | 98.4238 | 96.0292 |
0.0189 | 14.08 | 110000 | 0.0170 | 98.1404 | 96.2868 |
0.0185 | 14.72 | 115000 | 0.0208 | 97.374 | 96.8121 |
0.0166 | 15.36 | 120000 | 0.0169 | 98.1159 | 96.3788 |
0.0158 | 16.0 | 125000 | 0.0162 | 98.2865 | 96.2111 |
0.0145 | 16.64 | 130000 | 0.0140 | 98.8412 | 95.8038 |
0.0145 | 17.28 | 135000 | 0.0149 | 98.377 | 96.0822 |
0.0138 | 17.92 | 140000 | 0.0167 | 98.1249 | 96.3234 |
0.0139 | 18.56 | 145000 | 0.0132 | 98.9569 | 95.7071 |
0.0121 | 19.2 | 150000 | 0.0133 | 98.6588 | 95.9162 |
0.0124 | 19.84 | 155000 | 0.0169 | 98.2963 | 96.2428 |
0.0111 | 20.48 | 160000 | 0.0127 | 99.0523 | 95.7202 |
Framework versions
- Transformers 4.34.0
- Pytorch 2.1.0+cu121
- Datasets 2.14.5
- Tokenizers 0.14.1