<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
gpt2_sm_gen1_large
This model is a fine-tuned version of gpt2 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.4824
- Accuracy: 0.8063
- Precision: 0.5094
- Recall: 0.3114
- F1: 0.3865
- D-index: 1.5483
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 96000
- num_epochs: 20
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | D-index |
---|---|---|---|---|---|---|---|---|
0.5028 | 1.0 | 3000 | 0.5183 | 0.8039 | 0.4872 | 0.0162 | 0.0313 | 1.4419 |
0.4442 | 2.0 | 6000 | 0.4597 | 0.8113 | 0.6126 | 0.0995 | 0.1712 | 1.4819 |
0.415 | 3.0 | 9000 | 0.4217 | 0.8202 | 0.6309 | 0.1978 | 0.3012 | 1.5284 |
0.4047 | 4.0 | 12000 | 0.4365 | 0.8228 | 0.6682 | 0.1901 | 0.2960 | 1.5294 |
0.3827 | 5.0 | 15000 | 0.4141 | 0.8289 | 0.6502 | 0.2744 | 0.3859 | 1.5663 |
0.3527 | 6.0 | 18000 | 0.4357 | 0.8284 | 0.6320 | 0.2973 | 0.4044 | 1.5733 |
0.336 | 7.0 | 21000 | 0.4322 | 0.8285 | 0.6202 | 0.3216 | 0.4235 | 1.5815 |
0.3051 | 8.0 | 24000 | 0.4696 | 0.8259 | 0.6076 | 0.3148 | 0.4147 | 1.5758 |
0.2745 | 9.0 | 27000 | 0.4957 | 0.8164 | 0.5431 | 0.3969 | 0.4586 | 1.5903 |
0.2435 | 10.0 | 30000 | 0.5369 | 0.8151 | 0.5391 | 0.3871 | 0.4506 | 1.5853 |
0.2182 | 11.0 | 33000 | 0.6251 | 0.8176 | 0.5559 | 0.3428 | 0.4241 | 1.5740 |
0.2031 | 12.0 | 36000 | 0.6869 | 0.795 | 0.4760 | 0.4590 | 0.4673 | 1.5820 |
0.188 | 13.0 | 39000 | 0.8867 | 0.8147 | 0.5600 | 0.2522 | 0.3478 | 1.5396 |
0.1738 | 14.0 | 42000 | 1.0311 | 0.8077 | 0.5149 | 0.3152 | 0.3910 | 1.5514 |
0.1495 | 15.0 | 45000 | 1.2024 | 0.8053 | 0.5039 | 0.3815 | 0.4343 | 1.5703 |
0.1415 | 16.0 | 48000 | 1.3324 | 0.8045 | 0.5013 | 0.4015 | 0.4459 | 1.5759 |
0.1275 | 17.0 | 51000 | 1.5071 | 0.8051 | 0.5038 | 0.3416 | 0.4071 | 1.5568 |
0.1139 | 18.0 | 54000 | 1.4309 | 0.8053 | 0.5047 | 0.3177 | 0.3900 | 1.5490 |
0.1111 | 19.0 | 57000 | 1.5033 | 0.8082 | 0.5154 | 0.3496 | 0.4166 | 1.5636 |
0.1124 | 20.0 | 60000 | 1.4824 | 0.8063 | 0.5094 | 0.3114 | 0.3865 | 1.5483 |
Framework versions
- Transformers 4.28.0
- Pytorch 2.0.1+cu118
- Datasets 2.12.0
- Tokenizers 0.13.3