<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
chile-gpt
This model is a fine-tuned version of DeepESP/gpt2-spanish on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 9.4320
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.005
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 50
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
10.6676 | 0.98 | 6 | 9.5748 |
9.6237 | 1.98 | 12 | 9.2470 |
9.2815 | 2.98 | 18 | 8.8724 |
8.8097 | 3.98 | 24 | 8.3629 |
8.2296 | 4.98 | 30 | 7.8407 |
7.6891 | 5.98 | 36 | 7.4161 |
7.3013 | 6.98 | 42 | 7.1598 |
7.0671 | 7.98 | 48 | 7.0080 |
6.9404 | 8.98 | 54 | 6.9133 |
6.7543 | 9.98 | 60 | 6.7723 |
6.5845 | 10.98 | 66 | 6.6619 |
6.4193 | 11.98 | 72 | 6.5965 |
6.2554 | 12.98 | 78 | 6.5185 |
6.0993 | 13.98 | 84 | 6.4632 |
5.93 | 14.98 | 90 | 6.4155 |
5.7684 | 15.98 | 96 | 6.4183 |
5.6242 | 16.98 | 102 | 6.3981 |
5.4577 | 17.98 | 108 | 6.4609 |
5.2898 | 18.98 | 114 | 6.4577 |
5.1113 | 19.98 | 120 | 6.5617 |
4.9319 | 20.98 | 126 | 6.5827 |
4.7464 | 21.98 | 132 | 6.6961 |
4.5505 | 22.98 | 138 | 6.8359 |
4.341 | 23.98 | 144 | 6.9193 |
4.1324 | 24.98 | 150 | 7.0325 |
3.8938 | 25.98 | 156 | 7.1993 |
3.6691 | 26.98 | 162 | 7.3179 |
3.4316 | 27.98 | 168 | 7.4708 |
3.2041 | 28.98 | 174 | 7.5654 |
2.9614 | 29.98 | 180 | 7.7535 |
2.7189 | 30.98 | 186 | 7.8551 |
2.4944 | 31.98 | 192 | 8.0094 |
2.2624 | 32.98 | 198 | 8.0527 |
2.0292 | 33.98 | 204 | 8.1857 |
1.809 | 34.98 | 210 | 8.3468 |
1.597 | 35.98 | 216 | 8.4307 |
1.3849 | 36.98 | 222 | 8.6230 |
1.2081 | 37.98 | 228 | 8.6666 |
1.0273 | 38.98 | 234 | 8.7926 |
0.8661 | 39.98 | 240 | 8.8861 |
0.7308 | 40.98 | 246 | 8.9042 |
0.6189 | 41.98 | 252 | 8.9202 |
0.5335 | 42.98 | 258 | 9.0861 |
0.459 | 43.98 | 264 | 9.1198 |
0.3958 | 44.98 | 270 | 9.2129 |
0.3587 | 45.98 | 276 | 9.2434 |
0.3222 | 46.98 | 282 | 9.3005 |
0.2948 | 47.98 | 288 | 9.3961 |
0.2677 | 48.98 | 294 | 9.4605 |
0.2348 | 49.98 | 300 | 9.4320 |
Framework versions
- Transformers 4.24.0
- Pytorch 1.13.0+rocm5.2
- Datasets 2.6.1
- Tokenizers 0.13.2