<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
gpt-expt-sp-v3-K-600-9-mixed-with-tv-v3
This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.0607
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 500
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.1876 | 6.59 | 10000 | 0.0951 |
0.0512 | 13.18 | 20000 | 0.0827 |
0.0459 | 19.76 | 30000 | 0.0756 |
0.0437 | 26.35 | 40000 | 0.0749 |
0.0427 | 32.94 | 50000 | 0.0739 |
0.042 | 39.53 | 60000 | 0.0728 |
0.0416 | 46.11 | 70000 | 0.0720 |
0.0411 | 52.7 | 80000 | 0.0707 |
0.0408 | 59.29 | 90000 | 0.0699 |
0.0405 | 65.88 | 100000 | 0.0709 |
0.0401 | 72.46 | 110000 | 0.0686 |
0.0399 | 79.05 | 120000 | 0.0681 |
0.0396 | 85.64 | 130000 | 0.0676 |
0.0394 | 92.23 | 140000 | 0.0670 |
0.0392 | 98.81 | 150000 | 0.0676 |
0.039 | 105.4 | 160000 | 0.0657 |
0.0388 | 111.99 | 170000 | 0.0654 |
0.0386 | 118.58 | 180000 | 0.0648 |
0.0385 | 125.16 | 190000 | 0.0653 |
0.0383 | 131.75 | 200000 | 0.0652 |
0.0382 | 138.34 | 210000 | 0.0648 |
0.0381 | 144.93 | 220000 | 0.0647 |
0.0379 | 151.52 | 230000 | 0.0645 |
0.0378 | 158.1 | 240000 | 0.0644 |
0.0377 | 164.69 | 250000 | 0.0642 |
0.0377 | 171.28 | 260000 | 0.0642 |
0.0375 | 177.87 | 270000 | 0.0637 |
0.0374 | 184.45 | 280000 | 0.0636 |
0.0373 | 191.04 | 290000 | 0.0639 |
0.0372 | 197.63 | 300000 | 0.0637 |
0.0371 | 204.22 | 310000 | 0.0633 |
0.037 | 210.8 | 320000 | 0.0635 |
0.0369 | 217.39 | 330000 | 0.0631 |
0.0368 | 223.98 | 340000 | 0.0628 |
0.0367 | 230.57 | 350000 | 0.0629 |
0.0366 | 237.15 | 360000 | 0.0627 |
0.0365 | 243.74 | 370000 | 0.0629 |
0.0364 | 250.33 | 380000 | 0.0628 |
0.0364 | 256.92 | 390000 | 0.0624 |
0.0363 | 263.5 | 400000 | 0.0625 |
0.0362 | 270.09 | 410000 | 0.0624 |
0.0361 | 276.68 | 420000 | 0.0625 |
0.036 | 283.27 | 430000 | 0.0629 |
0.0359 | 289.86 | 440000 | 0.0621 |
0.0358 | 296.44 | 450000 | 0.0623 |
0.0358 | 303.03 | 460000 | 0.0619 |
0.0357 | 309.62 | 470000 | 0.0620 |
0.0356 | 316.21 | 480000 | 0.0619 |
0.0355 | 322.79 | 490000 | 0.0617 |
0.0354 | 329.38 | 500000 | 0.0621 |
0.0353 | 335.97 | 510000 | 0.0615 |
0.0353 | 342.56 | 520000 | 0.0615 |
0.0352 | 349.14 | 530000 | 0.0616 |
0.0351 | 355.73 | 540000 | 0.0614 |
0.035 | 362.32 | 550000 | 0.0614 |
0.035 | 368.91 | 560000 | 0.0612 |
0.0349 | 375.49 | 570000 | 0.0613 |
0.0348 | 382.08 | 580000 | 0.0612 |
0.0348 | 388.67 | 590000 | 0.0612 |
0.0347 | 395.26 | 600000 | 0.0611 |
0.0347 | 401.84 | 610000 | 0.0610 |
0.0346 | 408.43 | 620000 | 0.0610 |
0.0345 | 415.02 | 630000 | 0.0609 |
0.0345 | 421.61 | 640000 | 0.0610 |
0.0344 | 428.19 | 650000 | 0.0609 |
0.0344 | 434.78 | 660000 | 0.0609 |
0.0343 | 441.37 | 670000 | 0.0608 |
0.0343 | 447.96 | 680000 | 0.0608 |
0.0343 | 454.55 | 690000 | 0.0608 |
0.0342 | 461.13 | 700000 | 0.0608 |
0.0342 | 467.72 | 710000 | 0.0607 |
0.0342 | 474.31 | 720000 | 0.0607 |
0.0342 | 480.9 | 730000 | 0.0607 |
0.0341 | 487.48 | 740000 | 0.0607 |
0.0341 | 494.07 | 750000 | 0.0607 |
Framework versions
- Transformers 4.25.1
- Pytorch 1.11.0+cu113
- Datasets 2.8.0
- Tokenizers 0.13.2