<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
gpt-finetuning-cervantes
This model is a fine-tuned version of DeepESP/gpt2-spanish on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 6.8331
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 70
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
5.0291 | 0.96 | 13 | 4.6705 |
4.7952 | 1.96 | 26 | 4.4547 |
4.5759 | 2.96 | 39 | 4.3201 |
4.4032 | 3.96 | 52 | 4.2451 |
4.269 | 4.96 | 65 | 4.1911 |
4.143 | 5.96 | 78 | 4.1577 |
4.0229 | 6.96 | 91 | 4.1306 |
3.9047 | 7.96 | 104 | 4.1165 |
3.7886 | 8.96 | 117 | 4.1114 |
3.6666 | 9.96 | 130 | 4.1109 |
3.539 | 10.96 | 143 | 4.1201 |
3.4117 | 11.96 | 156 | 4.1374 |
3.272 | 12.96 | 169 | 4.1538 |
3.1283 | 13.96 | 182 | 4.1876 |
2.9728 | 14.96 | 195 | 4.2226 |
2.816 | 15.96 | 208 | 4.2695 |
2.6475 | 16.96 | 221 | 4.3106 |
2.4765 | 17.96 | 234 | 4.3678 |
2.302 | 18.96 | 247 | 4.4249 |
2.1257 | 19.96 | 260 | 4.4908 |
1.9537 | 20.96 | 273 | 4.5664 |
1.7834 | 21.96 | 286 | 4.6324 |
1.6177 | 22.96 | 299 | 4.6944 |
1.4573 | 23.96 | 312 | 4.7880 |
1.3057 | 24.96 | 325 | 4.8843 |
1.1652 | 25.96 | 338 | 4.9760 |
1.0341 | 26.96 | 351 | 5.0612 |
0.9101 | 27.96 | 364 | 5.1714 |
0.8017 | 28.96 | 377 | 5.2702 |
0.706 | 29.96 | 390 | 5.3530 |
0.6194 | 30.96 | 403 | 5.4535 |
0.5436 | 31.96 | 416 | 5.5373 |
0.4816 | 32.96 | 429 | 5.6153 |
0.4309 | 33.96 | 442 | 5.7014 |
0.3899 | 34.96 | 455 | 5.7749 |
0.3544 | 35.96 | 468 | 5.8430 |
0.3236 | 36.96 | 481 | 5.9237 |
0.3005 | 37.96 | 494 | 5.9824 |
0.2804 | 38.96 | 507 | 6.0264 |
0.263 | 39.96 | 520 | 6.0797 |
0.2513 | 40.96 | 533 | 6.1285 |
0.2376 | 41.96 | 546 | 6.1900 |
0.2264 | 42.96 | 559 | 6.2212 |
0.2183 | 43.96 | 572 | 6.2812 |
0.2104 | 44.96 | 585 | 6.3079 |
0.203 | 45.96 | 598 | 6.3501 |
0.1964 | 46.96 | 611 | 6.3730 |
0.1912 | 47.96 | 624 | 6.4190 |
0.1854 | 48.96 | 637 | 6.4598 |
0.1817 | 49.96 | 650 | 6.4618 |
0.1792 | 50.96 | 663 | 6.4914 |
0.1748 | 51.96 | 676 | 6.5385 |
0.1732 | 52.96 | 689 | 6.5689 |
0.1689 | 53.96 | 702 | 6.5761 |
0.1672 | 54.96 | 715 | 6.5775 |
0.1657 | 55.96 | 728 | 6.6362 |
0.1625 | 56.96 | 741 | 6.6573 |
0.1611 | 57.96 | 754 | 6.7019 |
0.1588 | 58.96 | 767 | 6.6602 |
0.1573 | 59.96 | 780 | 6.7015 |
0.1547 | 60.96 | 793 | 6.7323 |
0.1542 | 61.96 | 806 | 6.7368 |
0.1538 | 62.96 | 819 | 6.7704 |
0.1513 | 63.96 | 832 | 6.7963 |
0.1504 | 64.96 | 845 | 6.7988 |
0.1506 | 65.96 | 858 | 6.8386 |
0.1497 | 66.96 | 871 | 6.8039 |
0.15 | 67.96 | 884 | 6.8126 |
0.1497 | 68.96 | 897 | 6.8858 |
0.143 | 69.96 | 910 | 6.8331 |
Framework versions
- Transformers 4.24.0
- Pytorch 1.13.0+rocm5.2
- Datasets 2.6.1
- Tokenizers 0.13.2