<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
borges-gpt-collab
This model is a fine-tuned version of DeepESP/gpt2-spanish on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 8.3468
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 500
- num_epochs: 70
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
11.2135 | 0.96 | 7 | 10.2022 |
10.3195 | 1.96 | 14 | 9.6343 |
9.9127 | 2.96 | 21 | 9.4637 |
9.7295 | 3.96 | 28 | 9.2993 |
9.527 | 4.96 | 35 | 9.0962 |
9.2648 | 5.96 | 42 | 8.8294 |
8.9309 | 6.96 | 49 | 8.5103 |
8.5639 | 7.96 | 56 | 8.1858 |
8.2034 | 8.96 | 63 | 7.8816 |
7.8665 | 9.96 | 70 | 7.6303 |
7.5715 | 10.96 | 77 | 7.4307 |
7.3259 | 11.96 | 84 | 7.2632 |
7.136 | 12.96 | 91 | 7.1494 |
6.9558 | 13.96 | 98 | 7.0957 |
6.8068 | 14.96 | 105 | 7.0199 |
6.6656 | 15.96 | 112 | 6.9554 |
6.5264 | 16.96 | 119 | 6.9324 |
6.3843 | 17.96 | 126 | 6.8940 |
6.2204 | 18.96 | 133 | 6.8799 |
6.0915 | 19.96 | 140 | 6.8788 |
5.9532 | 20.96 | 147 | 6.8719 |
5.8169 | 21.96 | 154 | 6.8647 |
5.6531 | 22.96 | 161 | 6.8865 |
5.5125 | 23.96 | 168 | 6.8940 |
5.3666 | 24.96 | 175 | 6.9248 |
5.2377 | 25.96 | 182 | 6.9421 |
5.1115 | 26.96 | 189 | 6.9631 |
4.9639 | 27.96 | 196 | 7.0135 |
4.824 | 28.96 | 203 | 7.0352 |
4.6886 | 29.96 | 210 | 7.0729 |
4.5538 | 30.96 | 217 | 7.1385 |
4.4126 | 31.96 | 224 | 7.1561 |
4.2486 | 32.96 | 231 | 7.1792 |
4.0955 | 33.96 | 238 | 7.2767 |
3.9333 | 34.96 | 245 | 7.2815 |
3.7914 | 35.96 | 252 | 7.3463 |
3.618 | 36.96 | 259 | 7.3864 |
3.4453 | 37.96 | 266 | 7.4394 |
3.2795 | 38.96 | 273 | 7.4730 |
3.0994 | 39.96 | 280 | 7.4880 |
2.9143 | 40.96 | 287 | 7.5567 |
2.741 | 41.96 | 294 | 7.5451 |
2.5698 | 42.96 | 301 | 7.5966 |
2.3855 | 43.96 | 308 | 7.6898 |
2.2059 | 44.96 | 315 | 7.6957 |
2.0634 | 45.96 | 322 | 7.7503 |
1.8719 | 46.96 | 329 | 7.8369 |
1.7059 | 47.96 | 336 | 7.8411 |
1.54 | 48.96 | 343 | 7.8316 |
1.3768 | 49.96 | 350 | 7.8630 |
1.2177 | 50.96 | 357 | 7.9360 |
1.0663 | 51.96 | 364 | 7.9886 |
0.9569 | 52.96 | 371 | 8.0187 |
0.8281 | 53.96 | 378 | 8.0274 |
0.7074 | 54.96 | 385 | 8.1010 |
0.6095 | 55.96 | 392 | 8.1594 |
0.5262 | 56.96 | 399 | 8.1010 |
0.4678 | 57.96 | 406 | 8.1440 |
0.4105 | 58.96 | 413 | 8.1638 |
0.3766 | 59.96 | 420 | 8.1534 |
0.3425 | 60.96 | 427 | 8.1980 |
0.321 | 61.96 | 434 | 8.2184 |
0.3061 | 62.96 | 441 | 8.2499 |
0.2852 | 63.96 | 448 | 8.1690 |
0.2698 | 64.96 | 455 | 8.2160 |
0.2628 | 65.96 | 462 | 8.2616 |
0.2619 | 66.96 | 469 | 8.2948 |
0.2544 | 67.96 | 476 | 8.3553 |
0.2414 | 68.96 | 483 | 8.3712 |
0.2177 | 69.96 | 490 | 8.3468 |
Framework versions
- Transformers 4.24.0
- Pytorch 1.13.0+rocm5.2
- Datasets 2.6.1
- Tokenizers 0.13.2