<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
model_v1_complete_training_wt_init_48_tiny
This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:
- Loss: 3.6497
- Accuracy: 0.3896
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 10
- distributed_type: multi-GPU
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 10000
- num_epochs: 50
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
6.0224 | 0.33 | 30000 | 5.9447 | 0.1517 |
5.1853 | 0.66 | 60000 | 4.9635 | 0.2615 |
4.9483 | 0.98 | 90000 | 4.7016 | 0.2830 |
4.7679 | 1.31 | 120000 | 4.5154 | 0.2992 |
4.6448 | 1.64 | 150000 | 4.3884 | 0.3100 |
4.5688 | 1.97 | 180000 | 4.3095 | 0.3175 |
4.5102 | 2.29 | 210000 | 4.2511 | 0.3236 |
4.4662 | 2.62 | 240000 | 4.2038 | 0.3294 |
4.4269 | 2.95 | 270000 | 4.1677 | 0.3336 |
4.3982 | 3.28 | 300000 | 4.1367 | 0.3370 |
4.3714 | 3.6 | 330000 | 4.1103 | 0.3399 |
4.3493 | 3.93 | 360000 | 4.0869 | 0.3423 |
4.3303 | 4.26 | 390000 | 4.0680 | 0.3439 |
4.3131 | 4.59 | 420000 | 4.0467 | 0.3461 |
4.2875 | 4.92 | 450000 | 4.0292 | 0.3477 |
4.2629 | 5.24 | 480000 | 4.0109 | 0.3497 |
4.2413 | 5.57 | 510000 | 3.9931 | 0.3515 |
4.2282 | 5.9 | 540000 | 3.9759 | 0.3536 |
4.2003 | 6.23 | 570000 | 3.9608 | 0.3551 |
4.1867 | 6.55 | 600000 | 3.9445 | 0.3571 |
4.1607 | 6.88 | 630000 | 3.9273 | 0.3590 |
4.1511 | 7.21 | 660000 | 3.9130 | 0.3606 |
4.1335 | 7.54 | 690000 | 3.8971 | 0.3622 |
4.1158 | 7.87 | 720000 | 3.8798 | 0.3642 |
4.097 | 8.19 | 750000 | 3.8635 | 0.3663 |
4.0831 | 8.52 | 780000 | 3.8494 | 0.3679 |
4.0756 | 8.85 | 810000 | 3.8334 | 0.3696 |
4.0533 | 9.18 | 840000 | 3.8201 | 0.3712 |
4.0517 | 9.5 | 870000 | 3.8080 | 0.3724 |
4.0325 | 9.83 | 900000 | 3.7975 | 0.3734 |
4.0142 | 10.16 | 930000 | 3.7872 | 0.3748 |
4.0124 | 10.49 | 960000 | 3.7788 | 0.3759 |
4.0076 | 10.81 | 990000 | 3.7679 | 0.3767 |
3.9919 | 11.14 | 1020000 | 3.7609 | 0.3775 |
3.9888 | 11.47 | 1050000 | 3.7550 | 0.3783 |
3.9796 | 11.8 | 1080000 | 3.7481 | 0.3789 |
3.9742 | 12.13 | 1110000 | 3.7414 | 0.3796 |
3.9667 | 12.45 | 1140000 | 3.7370 | 0.3802 |
3.9652 | 12.78 | 1170000 | 3.7289 | 0.3810 |
3.9548 | 13.11 | 1200000 | 3.7278 | 0.3812 |
3.9556 | 13.44 | 1230000 | 3.7213 | 0.3817 |
3.9444 | 13.76 | 1260000 | 3.7152 | 0.3825 |
3.9428 | 14.09 | 1290000 | 3.7120 | 0.3827 |
3.9424 | 14.42 | 1320000 | 3.7072 | 0.3834 |
3.9389 | 14.75 | 1350000 | 3.7047 | 0.3836 |
3.936 | 15.07 | 1380000 | 3.6998 | 0.3844 |
3.9246 | 15.4 | 1410000 | 3.6968 | 0.3847 |
3.9281 | 15.73 | 1440000 | 3.6925 | 0.3851 |
3.9177 | 16.06 | 1470000 | 3.6916 | 0.3849 |
3.9216 | 16.39 | 1500000 | 3.6870 | 0.3855 |
3.9141 | 16.71 | 1530000 | 3.6822 | 0.3863 |
3.9154 | 17.04 | 1560000 | 3.6804 | 0.3864 |
3.9145 | 17.37 | 1590000 | 3.6795 | 0.3863 |
3.9103 | 17.7 | 1620000 | 3.6734 | 0.3869 |
3.9079 | 18.02 | 1650000 | 3.6724 | 0.3873 |
3.901 | 18.35 | 1680000 | 3.6707 | 0.3872 |
3.9015 | 18.68 | 1710000 | 3.6695 | 0.3873 |
3.8987 | 19.01 | 1740000 | 3.6672 | 0.3877 |
3.8929 | 19.33 | 1770000 | 3.6647 | 0.3878 |
3.892 | 19.66 | 1800000 | 3.6609 | 0.3884 |
3.8906 | 19.99 | 1830000 | 3.6595 | 0.3886 |
3.8923 | 20.32 | 1860000 | 3.6594 | 0.3885 |
3.8901 | 20.65 | 1890000 | 3.6541 | 0.3893 |
3.8853 | 20.97 | 1920000 | 3.6539 | 0.3891 |
3.8808 | 21.3 | 1950000 | 3.6527 | 0.3894 |
3.8835 | 21.63 | 1980000 | 3.6497 | 0.3896 |
Framework versions
- Transformers 4.30.2
- Pytorch 1.14.0a0+410ce96
- Datasets 2.13.0
- Tokenizers 0.13.3