<!-- This model card has been generated automatically according to the information Keras had access to. You should probably proofread and complete it, then remove this comment. -->
lewtun/distilgpt2-finetuned-shakespeare
This model is a fine-tuned version of distilgpt2 on an unknown dataset. It achieves the following results on the evaluation set:
- Train Loss: 2.9411
- Validation Loss: 3.5767
- Epoch: 29
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: float32
Training results
| Train Loss | Validation Loss | Epoch |
|---|---|---|
| 4.2112 | 3.8253 | 0 |
| 3.8997 | 3.6898 | 1 |
| 3.7783 | 3.6304 | 2 |
| 3.7046 | 3.5846 | 3 |
| 3.6477 | 3.5667 | 4 |
| 3.6001 | 3.5445 | 5 |
| 3.5563 | 3.5333 | 6 |
| 3.5198 | 3.5240 | 7 |
| 3.4842 | 3.5146 | 8 |
| 3.4505 | 3.5126 | 9 |
| 3.4184 | 3.5022 | 10 |
| 3.3912 | 3.5027 | 11 |
| 3.3613 | 3.5003 | 12 |
| 3.3337 | 3.4985 | 13 |
| 3.3045 | 3.5004 | 14 |
| 3.2772 | 3.5014 | 15 |
| 3.2527 | 3.5018 | 16 |
| 3.2274 | 3.5053 | 17 |
| 3.2011 | 3.5106 | 18 |
| 3.1754 | 3.5143 | 19 |
| 3.1512 | 3.5181 | 20 |
| 3.1259 | 3.5274 | 21 |
| 3.1003 | 3.5215 | 22 |
| 3.0809 | 3.5354 | 23 |
| 3.0568 | 3.5335 | 24 |
| 3.0306 | 3.5502 | 25 |
| 3.0080 | 3.5574 | 26 |
| 2.9857 | 3.5587 | 27 |
| 2.9654 | 3.5760 | 28 |
| 2.9411 | 3.5767 | 29 |
Framework versions
- Transformers 4.22.2
- TensorFlow 2.10.0
- Datasets 2.5.2
- Tokenizers 0.11.6