<!-- This model card has been generated automatically according to the information Keras had access to. You should probably proofread and complete it, then remove this comment. -->
lewtun/distilgpt2-finetuned-shakespeare
This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:
- Train Loss: 2.9411
- Validation Loss: 3.5767
- Epoch: 29
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: float32
Training results
Train Loss | Validation Loss | Epoch |
---|---|---|
4.2112 | 3.8253 | 0 |
3.8997 | 3.6898 | 1 |
3.7783 | 3.6304 | 2 |
3.7046 | 3.5846 | 3 |
3.6477 | 3.5667 | 4 |
3.6001 | 3.5445 | 5 |
3.5563 | 3.5333 | 6 |
3.5198 | 3.5240 | 7 |
3.4842 | 3.5146 | 8 |
3.4505 | 3.5126 | 9 |
3.4184 | 3.5022 | 10 |
3.3912 | 3.5027 | 11 |
3.3613 | 3.5003 | 12 |
3.3337 | 3.4985 | 13 |
3.3045 | 3.5004 | 14 |
3.2772 | 3.5014 | 15 |
3.2527 | 3.5018 | 16 |
3.2274 | 3.5053 | 17 |
3.2011 | 3.5106 | 18 |
3.1754 | 3.5143 | 19 |
3.1512 | 3.5181 | 20 |
3.1259 | 3.5274 | 21 |
3.1003 | 3.5215 | 22 |
3.0809 | 3.5354 | 23 |
3.0568 | 3.5335 | 24 |
3.0306 | 3.5502 | 25 |
3.0080 | 3.5574 | 26 |
2.9857 | 3.5587 | 27 |
2.9654 | 3.5760 | 28 |
2.9411 | 3.5767 | 29 |
Framework versions
- Transformers 4.22.2
- TensorFlow 2.10.0
- Datasets 2.5.2
- Tokenizers 0.11.6