<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
distilgpt2-finetuned-python-stack-clean-answers-e200
This model is a fine-tuned version of distilgpt2 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0700
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 200
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 1.0 | 28 | 3.2510 |
No log | 2.0 | 56 | 3.1681 |
No log | 3.0 | 84 | 3.0891 |
No log | 4.0 | 112 | 3.0233 |
No log | 5.0 | 140 | 2.9563 |
No log | 6.0 | 168 | 2.8967 |
No log | 7.0 | 196 | 2.8380 |
No log | 8.0 | 224 | 2.7777 |
No log | 9.0 | 252 | 2.7218 |
No log | 10.0 | 280 | 2.6671 |
No log | 11.0 | 308 | 2.6158 |
No log | 12.0 | 336 | 2.5594 |
No log | 13.0 | 364 | 2.5105 |
No log | 14.0 | 392 | 2.4551 |
No log | 15.0 | 420 | 2.4029 |
No log | 16.0 | 448 | 2.3500 |
No log | 17.0 | 476 | 2.2973 |
3.016 | 18.0 | 504 | 2.2479 |
3.016 | 19.0 | 532 | 2.1940 |
3.016 | 20.0 | 560 | 2.1436 |
3.016 | 21.0 | 588 | 2.0926 |
3.016 | 22.0 | 616 | 2.0419 |
3.016 | 23.0 | 644 | 1.9912 |
3.016 | 24.0 | 672 | 1.9435 |
3.016 | 25.0 | 700 | 1.8982 |
3.016 | 26.0 | 728 | 1.8483 |
3.016 | 27.0 | 756 | 1.7974 |
3.016 | 28.0 | 784 | 1.7525 |
3.016 | 29.0 | 812 | 1.7082 |
3.016 | 30.0 | 840 | 1.6610 |
3.016 | 31.0 | 868 | 1.6108 |
3.016 | 32.0 | 896 | 1.5655 |
3.016 | 33.0 | 924 | 1.5193 |
3.016 | 34.0 | 952 | 1.4757 |
3.016 | 35.0 | 980 | 1.4342 |
2.2411 | 36.0 | 1008 | 1.3863 |
2.2411 | 37.0 | 1036 | 1.3433 |
2.2411 | 38.0 | 1064 | 1.3095 |
2.2411 | 39.0 | 1092 | 1.2757 |
2.2411 | 40.0 | 1120 | 1.2278 |
2.2411 | 41.0 | 1148 | 1.1887 |
2.2411 | 42.0 | 1176 | 1.1481 |
2.2411 | 43.0 | 1204 | 1.1193 |
2.2411 | 44.0 | 1232 | 1.0711 |
2.2411 | 45.0 | 1260 | 1.0332 |
2.2411 | 46.0 | 1288 | 1.0062 |
2.2411 | 47.0 | 1316 | 0.9696 |
2.2411 | 48.0 | 1344 | 0.9358 |
2.2411 | 49.0 | 1372 | 0.9109 |
2.2411 | 50.0 | 1400 | 0.8690 |
2.2411 | 51.0 | 1428 | 0.8420 |
2.2411 | 52.0 | 1456 | 0.8111 |
2.2411 | 53.0 | 1484 | 0.7848 |
1.5799 | 54.0 | 1512 | 0.7596 |
1.5799 | 55.0 | 1540 | 0.7361 |
1.5799 | 56.0 | 1568 | 0.7081 |
1.5799 | 57.0 | 1596 | 0.6818 |
1.5799 | 58.0 | 1624 | 0.6601 |
1.5799 | 59.0 | 1652 | 0.6351 |
1.5799 | 60.0 | 1680 | 0.6145 |
1.5799 | 61.0 | 1708 | 0.5926 |
1.5799 | 62.0 | 1736 | 0.5711 |
1.5799 | 63.0 | 1764 | 0.5492 |
1.5799 | 64.0 | 1792 | 0.5251 |
1.5799 | 65.0 | 1820 | 0.5114 |
1.5799 | 66.0 | 1848 | 0.4946 |
1.5799 | 67.0 | 1876 | 0.4758 |
1.5799 | 68.0 | 1904 | 0.4628 |
1.5799 | 69.0 | 1932 | 0.4435 |
1.5799 | 70.0 | 1960 | 0.4325 |
1.5799 | 71.0 | 1988 | 0.4168 |
1.0863 | 72.0 | 2016 | 0.4025 |
1.0863 | 73.0 | 2044 | 0.3904 |
1.0863 | 74.0 | 2072 | 0.3731 |
1.0863 | 75.0 | 2100 | 0.3606 |
1.0863 | 76.0 | 2128 | 0.3451 |
1.0863 | 77.0 | 2156 | 0.3387 |
1.0863 | 78.0 | 2184 | 0.3277 |
1.0863 | 79.0 | 2212 | 0.3160 |
1.0863 | 80.0 | 2240 | 0.3108 |
1.0863 | 81.0 | 2268 | 0.2980 |
1.0863 | 82.0 | 2296 | 0.2897 |
1.0863 | 83.0 | 2324 | 0.2814 |
1.0863 | 84.0 | 2352 | 0.2715 |
1.0863 | 85.0 | 2380 | 0.2607 |
1.0863 | 86.0 | 2408 | 0.2521 |
1.0863 | 87.0 | 2436 | 0.2482 |
1.0863 | 88.0 | 2464 | 0.2386 |
1.0863 | 89.0 | 2492 | 0.2347 |
0.7543 | 90.0 | 2520 | 0.2231 |
0.7543 | 91.0 | 2548 | 0.2205 |
0.7543 | 92.0 | 2576 | 0.2135 |
0.7543 | 93.0 | 2604 | 0.2081 |
0.7543 | 94.0 | 2632 | 0.2018 |
0.7543 | 95.0 | 2660 | 0.1956 |
0.7543 | 96.0 | 2688 | 0.1910 |
0.7543 | 97.0 | 2716 | 0.1855 |
0.7543 | 98.0 | 2744 | 0.1806 |
0.7543 | 99.0 | 2772 | 0.1768 |
0.7543 | 100.0 | 2800 | 0.1715 |
0.7543 | 101.0 | 2828 | 0.1687 |
0.7543 | 102.0 | 2856 | 0.1649 |
0.7543 | 103.0 | 2884 | 0.1629 |
0.7543 | 104.0 | 2912 | 0.1570 |
0.7543 | 105.0 | 2940 | 0.1563 |
0.7543 | 106.0 | 2968 | 0.1502 |
0.7543 | 107.0 | 2996 | 0.1486 |
0.5478 | 108.0 | 3024 | 0.1443 |
0.5478 | 109.0 | 3052 | 0.1408 |
0.5478 | 110.0 | 3080 | 0.1389 |
0.5478 | 111.0 | 3108 | 0.1366 |
0.5478 | 112.0 | 3136 | 0.1338 |
0.5478 | 113.0 | 3164 | 0.1304 |
0.5478 | 114.0 | 3192 | 0.1290 |
0.5478 | 115.0 | 3220 | 0.1264 |
0.5478 | 116.0 | 3248 | 0.1234 |
0.5478 | 117.0 | 3276 | 0.1212 |
0.5478 | 118.0 | 3304 | 0.1197 |
0.5478 | 119.0 | 3332 | 0.1185 |
0.5478 | 120.0 | 3360 | 0.1159 |
0.5478 | 121.0 | 3388 | 0.1130 |
0.5478 | 122.0 | 3416 | 0.1125 |
0.5478 | 123.0 | 3444 | 0.1106 |
0.5478 | 124.0 | 3472 | 0.1087 |
0.4258 | 125.0 | 3500 | 0.1077 |
0.4258 | 126.0 | 3528 | 0.1068 |
0.4258 | 127.0 | 3556 | 0.1048 |
0.4258 | 128.0 | 3584 | 0.1039 |
0.4258 | 129.0 | 3612 | 0.1022 |
0.4258 | 130.0 | 3640 | 0.1002 |
0.4258 | 131.0 | 3668 | 0.0987 |
0.4258 | 132.0 | 3696 | 0.0980 |
0.4258 | 133.0 | 3724 | 0.0973 |
0.4258 | 134.0 | 3752 | 0.0955 |
0.4258 | 135.0 | 3780 | 0.0951 |
0.4258 | 136.0 | 3808 | 0.0937 |
0.4258 | 137.0 | 3836 | 0.0932 |
0.4258 | 138.0 | 3864 | 0.0920 |
0.4258 | 139.0 | 3892 | 0.0908 |
0.4258 | 140.0 | 3920 | 0.0903 |
0.4258 | 141.0 | 3948 | 0.0889 |
0.4258 | 142.0 | 3976 | 0.0883 |
0.3496 | 143.0 | 4004 | 0.0879 |
0.3496 | 144.0 | 4032 | 0.0872 |
0.3496 | 145.0 | 4060 | 0.0865 |
0.3496 | 146.0 | 4088 | 0.0852 |
0.3496 | 147.0 | 4116 | 0.0849 |
0.3496 | 148.0 | 4144 | 0.0843 |
0.3496 | 149.0 | 4172 | 0.0836 |
0.3496 | 150.0 | 4200 | 0.0832 |
0.3496 | 151.0 | 4228 | 0.0822 |
0.3496 | 152.0 | 4256 | 0.0817 |
0.3496 | 153.0 | 4284 | 0.0813 |
0.3496 | 154.0 | 4312 | 0.0805 |
0.3496 | 155.0 | 4340 | 0.0799 |
0.3496 | 156.0 | 4368 | 0.0796 |
0.3496 | 157.0 | 4396 | 0.0789 |
0.3496 | 158.0 | 4424 | 0.0784 |
0.3496 | 159.0 | 4452 | 0.0781 |
0.3496 | 160.0 | 4480 | 0.0777 |
0.3045 | 161.0 | 4508 | 0.0776 |
0.3045 | 162.0 | 4536 | 0.0771 |
0.3045 | 163.0 | 4564 | 0.0762 |
0.3045 | 164.0 | 4592 | 0.0762 |
0.3045 | 165.0 | 4620 | 0.0763 |
0.3045 | 166.0 | 4648 | 0.0758 |
0.3045 | 167.0 | 4676 | 0.0754 |
0.3045 | 168.0 | 4704 | 0.0750 |
0.3045 | 169.0 | 4732 | 0.0748 |
0.3045 | 170.0 | 4760 | 0.0746 |
0.3045 | 171.0 | 4788 | 0.0742 |
0.3045 | 172.0 | 4816 | 0.0740 |
0.3045 | 173.0 | 4844 | 0.0735 |
0.3045 | 174.0 | 4872 | 0.0735 |
0.3045 | 175.0 | 4900 | 0.0732 |
0.3045 | 176.0 | 4928 | 0.0728 |
0.3045 | 177.0 | 4956 | 0.0724 |
0.3045 | 178.0 | 4984 | 0.0723 |
0.2786 | 179.0 | 5012 | 0.0721 |
0.2786 | 180.0 | 5040 | 0.0719 |
0.2786 | 181.0 | 5068 | 0.0717 |
0.2786 | 182.0 | 5096 | 0.0715 |
0.2786 | 183.0 | 5124 | 0.0714 |
0.2786 | 184.0 | 5152 | 0.0713 |
0.2786 | 185.0 | 5180 | 0.0712 |
0.2786 | 186.0 | 5208 | 0.0710 |
0.2786 | 187.0 | 5236 | 0.0707 |
0.2786 | 188.0 | 5264 | 0.0705 |
0.2786 | 189.0 | 5292 | 0.0704 |
0.2786 | 190.0 | 5320 | 0.0704 |
0.2786 | 191.0 | 5348 | 0.0704 |
0.2786 | 192.0 | 5376 | 0.0702 |
0.2786 | 193.0 | 5404 | 0.0703 |
0.2786 | 194.0 | 5432 | 0.0702 |
0.2786 | 195.0 | 5460 | 0.0702 |
0.2786 | 196.0 | 5488 | 0.0701 |
0.2633 | 197.0 | 5516 | 0.0701 |
0.2633 | 198.0 | 5544 | 0.0701 |
0.2633 | 199.0 | 5572 | 0.0700 |
0.2633 | 200.0 | 5600 | 0.0700 |
Framework versions
- Transformers 4.30.2
- Pytorch 2.0.1+cu118
- Datasets 2.13.1
- Tokenizers 0.13.3