<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mobilebert_sa_GLUE_Experiment_logit_kd_pretrain_qqp
This model is a fine-tuned version of gokuls/mobilebert_sa_pre-training-complete on the GLUE QQP dataset. It achieves the following results on the evaluation set:
- Loss: 0.1715
- Accuracy: 0.9133
- F1: 0.8843
- Combined Score: 0.8988
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 128
- eval_batch_size: 128
- seed: 10
- distributed_type: multi-GPU
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 50
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Combined Score |
---|---|---|---|---|---|---|
0.3902 | 1.0 | 2843 | 0.2870 | 0.8788 | 0.8463 | 0.8625 |
0.226 | 2.0 | 5686 | 0.1968 | 0.8979 | 0.8649 | 0.8814 |
0.1681 | 3.0 | 8529 | 0.1910 | 0.9038 | 0.8719 | 0.8878 |
0.1346 | 4.0 | 11372 | 0.1989 | 0.9055 | 0.8772 | 0.8914 |
0.1124 | 5.0 | 14215 | 0.1970 | 0.9049 | 0.8747 | 0.8898 |
0.0978 | 6.0 | 17058 | 0.1876 | 0.9095 | 0.8806 | 0.8950 |
0.0876 | 7.0 | 19901 | 0.1893 | 0.9077 | 0.8773 | 0.8925 |
0.0802 | 8.0 | 22744 | 0.1940 | 0.9067 | 0.8773 | 0.8920 |
0.0746 | 9.0 | 25587 | 0.1846 | 0.9090 | 0.8787 | 0.8938 |
0.0699 | 10.0 | 28430 | 0.1890 | 0.9093 | 0.8809 | 0.8951 |
0.0663 | 11.0 | 31273 | 0.1803 | 0.9103 | 0.8804 | 0.8953 |
0.0632 | 12.0 | 34116 | 0.1905 | 0.9084 | 0.8805 | 0.8945 |
0.0606 | 13.0 | 36959 | 0.1835 | 0.9094 | 0.8813 | 0.8953 |
0.0583 | 14.0 | 39802 | 0.1786 | 0.9112 | 0.8805 | 0.8958 |
0.0562 | 15.0 | 42645 | 0.1900 | 0.9091 | 0.8817 | 0.8954 |
0.0546 | 16.0 | 45488 | 0.1753 | 0.9126 | 0.8825 | 0.8975 |
0.0529 | 17.0 | 48331 | 0.1761 | 0.9121 | 0.8825 | 0.8973 |
0.0515 | 18.0 | 51174 | 0.1784 | 0.9129 | 0.8842 | 0.8986 |
0.0501 | 19.0 | 54017 | 0.1730 | 0.9129 | 0.8847 | 0.8988 |
0.049 | 20.0 | 56860 | 0.1812 | 0.9116 | 0.8835 | 0.8975 |
0.0479 | 21.0 | 59703 | 0.1751 | 0.9115 | 0.8830 | 0.8972 |
0.0469 | 22.0 | 62546 | 0.1737 | 0.9120 | 0.8833 | 0.8976 |
0.0461 | 23.0 | 65389 | 0.1739 | 0.9129 | 0.8844 | 0.8986 |
0.0452 | 24.0 | 68232 | 0.1715 | 0.9133 | 0.8843 | 0.8988 |
0.0447 | 25.0 | 71075 | 0.1748 | 0.9119 | 0.8844 | 0.8982 |
0.0437 | 26.0 | 73918 | 0.1734 | 0.9129 | 0.8841 | 0.8985 |
0.0431 | 27.0 | 76761 | 0.1727 | 0.9125 | 0.8830 | 0.8977 |
0.0425 | 28.0 | 79604 | 0.1803 | 0.9120 | 0.8851 | 0.8985 |
0.0419 | 29.0 | 82447 | 0.1720 | 0.9124 | 0.8835 | 0.8980 |
Framework versions
- Transformers 4.26.0
- Pytorch 1.14.0a0+410ce96
- Datasets 2.9.0
- Tokenizers 0.13.2