<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
all-base-guten-rarity-all-end-19k-no-repetition
This model is a fine-tuned version of gpt2 on the generator dataset. It achieves the following results on the evaluation set:
- Loss: 4.3413
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 6
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
6.761 | 0.31 | 500 | 5.6601 |
5.4095 | 0.63 | 1000 | 5.2183 |
5.0671 | 0.94 | 1500 | 4.9632 |
4.7721 | 1.26 | 2000 | 4.8195 |
4.6309 | 1.57 | 2500 | 4.6918 |
4.521 | 1.89 | 3000 | 4.5850 |
4.3114 | 2.2 | 3500 | 4.5239 |
4.2159 | 2.52 | 4000 | 4.4585 |
4.1761 | 2.83 | 4500 | 4.4018 |
4.0248 | 3.15 | 5000 | 4.3747 |
3.8954 | 3.46 | 5500 | 4.3491 |
3.8848 | 3.78 | 6000 | 4.3100 |
3.7789 | 4.09 | 6500 | 4.2990 |
3.6043 | 4.41 | 7000 | 4.2934 |
3.5959 | 4.72 | 7500 | 4.2789 |
3.5641 | 5.03 | 8000 | 4.2738 |
3.4039 | 5.35 | 8500 | 4.2779 |
3.4003 | 5.66 | 9000 | 4.2766 |
3.4051 | 5.98 | 9500 | 4.2761 |
Framework versions
- Transformers 4.26.1
- Pytorch 1.11.0+cu113
- Datasets 2.13.0
- Tokenizers 0.13.3