<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
cervantes-gpt
This model is a fine-tuned version of DeepESP/gpt2-spanish on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 8.1302
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 500
- num_epochs: 70
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
10.6864 | 0.96 | 13 | 9.4380 |
9.6293 | 1.96 | 26 | 9.0791 |
9.2039 | 2.96 | 39 | 8.5999 |
8.5709 | 3.96 | 52 | 7.9434 |
7.8331 | 4.96 | 65 | 7.2929 |
7.1731 | 5.96 | 78 | 6.7935 |
6.681 | 6.96 | 91 | 6.4989 |
6.359 | 7.96 | 104 | 6.3480 |
6.1194 | 8.96 | 117 | 6.1738 |
5.8887 | 9.96 | 130 | 6.0409 |
5.6722 | 10.96 | 143 | 5.9433 |
5.4738 | 11.96 | 156 | 5.8746 |
5.2853 | 12.96 | 169 | 5.7898 |
5.1082 | 13.96 | 182 | 5.7821 |
4.9458 | 14.96 | 195 | 5.7489 |
4.7782 | 15.96 | 208 | 5.7815 |
4.613 | 16.96 | 221 | 5.7930 |
4.4529 | 17.96 | 234 | 5.8027 |
4.2796 | 18.96 | 247 | 5.8341 |
4.0998 | 19.96 | 260 | 5.8972 |
3.9184 | 20.96 | 273 | 6.0337 |
3.7264 | 21.96 | 286 | 6.0392 |
3.5419 | 22.96 | 299 | 6.1160 |
3.3477 | 23.96 | 312 | 6.2168 |
3.1492 | 24.96 | 325 | 6.2471 |
2.9641 | 25.96 | 338 | 6.3488 |
2.7695 | 26.96 | 351 | 6.4372 |
2.5882 | 27.96 | 364 | 6.4921 |
2.4007 | 28.96 | 377 | 6.6257 |
2.2178 | 29.96 | 390 | 6.6335 |
2.0489 | 30.96 | 403 | 6.7425 |
1.8779 | 31.96 | 416 | 6.7861 |
1.7209 | 32.96 | 429 | 6.8796 |
1.5707 | 33.96 | 442 | 6.9420 |
1.3984 | 34.96 | 455 | 6.9857 |
1.2653 | 35.96 | 468 | 7.0169 |
1.1368 | 36.96 | 481 | 7.0835 |
1.01 | 37.96 | 494 | 7.1329 |
0.8959 | 38.96 | 507 | 7.2498 |
0.792 | 39.96 | 520 | 7.2971 |
0.6844 | 40.96 | 533 | 7.2841 |
0.6028 | 41.96 | 546 | 7.3295 |
0.5216 | 42.96 | 559 | 7.3776 |
0.467 | 43.96 | 572 | 7.4190 |
0.417 | 44.96 | 585 | 7.5201 |
0.3785 | 45.96 | 598 | 7.5042 |
0.3456 | 46.96 | 611 | 7.5822 |
0.3164 | 47.96 | 624 | 7.6342 |
0.2882 | 48.96 | 637 | 7.6722 |
0.2674 | 49.96 | 650 | 7.6951 |
0.2471 | 50.96 | 663 | 7.7717 |
0.2287 | 51.96 | 676 | 7.8266 |
0.2116 | 52.96 | 689 | 7.8124 |
0.195 | 53.96 | 702 | 7.8595 |
0.1784 | 54.96 | 715 | 7.8968 |
0.1633 | 55.96 | 728 | 7.9242 |
0.1491 | 56.96 | 741 | 7.9956 |
0.1412 | 57.96 | 754 | 8.0052 |
0.1338 | 58.96 | 767 | 8.0319 |
0.1284 | 59.96 | 780 | 8.0596 |
0.1229 | 60.96 | 793 | 8.0776 |
0.1193 | 61.96 | 806 | 8.0791 |
0.117 | 62.96 | 819 | 8.0912 |
0.1142 | 63.96 | 832 | 8.1174 |
0.1129 | 64.96 | 845 | 8.1141 |
0.1114 | 65.96 | 858 | 8.1197 |
0.11 | 66.96 | 871 | 8.1275 |
0.1102 | 67.96 | 884 | 8.1302 |
0.1093 | 68.96 | 897 | 8.1303 |
0.1046 | 69.96 | 910 | 8.1302 |
Framework versions
- Transformers 4.24.0
- Pytorch 1.13.0+rocm5.2
- Datasets 2.6.1
- Tokenizers 0.13.2