<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
chile-gpt
This model is a fine-tuned version of DeepESP/gpt2-spanish on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 9.4320
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.005
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 50
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 10.6676 | 0.98 | 6 | 9.5748 |
| 9.6237 | 1.98 | 12 | 9.2470 |
| 9.2815 | 2.98 | 18 | 8.8724 |
| 8.8097 | 3.98 | 24 | 8.3629 |
| 8.2296 | 4.98 | 30 | 7.8407 |
| 7.6891 | 5.98 | 36 | 7.4161 |
| 7.3013 | 6.98 | 42 | 7.1598 |
| 7.0671 | 7.98 | 48 | 7.0080 |
| 6.9404 | 8.98 | 54 | 6.9133 |
| 6.7543 | 9.98 | 60 | 6.7723 |
| 6.5845 | 10.98 | 66 | 6.6619 |
| 6.4193 | 11.98 | 72 | 6.5965 |
| 6.2554 | 12.98 | 78 | 6.5185 |
| 6.0993 | 13.98 | 84 | 6.4632 |
| 5.93 | 14.98 | 90 | 6.4155 |
| 5.7684 | 15.98 | 96 | 6.4183 |
| 5.6242 | 16.98 | 102 | 6.3981 |
| 5.4577 | 17.98 | 108 | 6.4609 |
| 5.2898 | 18.98 | 114 | 6.4577 |
| 5.1113 | 19.98 | 120 | 6.5617 |
| 4.9319 | 20.98 | 126 | 6.5827 |
| 4.7464 | 21.98 | 132 | 6.6961 |
| 4.5505 | 22.98 | 138 | 6.8359 |
| 4.341 | 23.98 | 144 | 6.9193 |
| 4.1324 | 24.98 | 150 | 7.0325 |
| 3.8938 | 25.98 | 156 | 7.1993 |
| 3.6691 | 26.98 | 162 | 7.3179 |
| 3.4316 | 27.98 | 168 | 7.4708 |
| 3.2041 | 28.98 | 174 | 7.5654 |
| 2.9614 | 29.98 | 180 | 7.7535 |
| 2.7189 | 30.98 | 186 | 7.8551 |
| 2.4944 | 31.98 | 192 | 8.0094 |
| 2.2624 | 32.98 | 198 | 8.0527 |
| 2.0292 | 33.98 | 204 | 8.1857 |
| 1.809 | 34.98 | 210 | 8.3468 |
| 1.597 | 35.98 | 216 | 8.4307 |
| 1.3849 | 36.98 | 222 | 8.6230 |
| 1.2081 | 37.98 | 228 | 8.6666 |
| 1.0273 | 38.98 | 234 | 8.7926 |
| 0.8661 | 39.98 | 240 | 8.8861 |
| 0.7308 | 40.98 | 246 | 8.9042 |
| 0.6189 | 41.98 | 252 | 8.9202 |
| 0.5335 | 42.98 | 258 | 9.0861 |
| 0.459 | 43.98 | 264 | 9.1198 |
| 0.3958 | 44.98 | 270 | 9.2129 |
| 0.3587 | 45.98 | 276 | 9.2434 |
| 0.3222 | 46.98 | 282 | 9.3005 |
| 0.2948 | 47.98 | 288 | 9.3961 |
| 0.2677 | 48.98 | 294 | 9.4605 |
| 0.2348 | 49.98 | 300 | 9.4320 |
Framework versions
- Transformers 4.24.0
- Pytorch 1.13.0+rocm5.2
- Datasets 2.6.1
- Tokenizers 0.13.2