<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
my_awesome_eli5_clm-model-text
This model is a fine-tuned version of distilgpt2 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 3.7314
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 12
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
3.8707 | 1.0 | 1133 | 3.7535 |
3.7616 | 2.0 | 2266 | 3.7337 |
3.6998 | 3.0 | 3399 | 3.7246 |
3.6529 | 4.0 | 4532 | 3.7209 |
3.6022 | 5.0 | 5665 | 3.7203 |
3.5724 | 6.0 | 6798 | 3.7218 |
3.5374 | 7.0 | 7931 | 3.7198 |
3.5151 | 8.0 | 9064 | 3.7240 |
3.5004 | 9.0 | 10197 | 3.7274 |
3.4857 | 10.0 | 11330 | 3.7288 |
3.4702 | 11.0 | 12463 | 3.7305 |
3.4646 | 12.0 | 13596 | 3.7314 |
Framework versions
- Transformers 4.28.0
- Pytorch 2.0.1+cu118
- Datasets 2.14.5
- Tokenizers 0.13.3