<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
bert-tiny-Massive-intent-KD-BERT_and_distilBERT
This model is a fine-tuned version of google/bert_uncased_L-2_H-128_A-2 on the massive dataset. It achieves the following results on the evaluation set:
- Loss: 2.3729
- Accuracy: 0.8470
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 33
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 50
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
15.1159 | 1.0 | 720 | 12.8257 | 0.2253 |
12.9949 | 2.0 | 1440 | 10.9891 | 0.4304 |
11.3865 | 3.0 | 2160 | 9.5622 | 0.5032 |
10.0553 | 4.0 | 2880 | 8.3700 | 0.5539 |
8.9431 | 5.0 | 3600 | 7.4127 | 0.6104 |
8.0135 | 6.0 | 4320 | 6.6185 | 0.6286 |
7.1987 | 7.0 | 5040 | 5.9517 | 0.6818 |
6.5168 | 8.0 | 5760 | 5.3879 | 0.7118 |
5.9352 | 9.0 | 6480 | 4.9426 | 0.7275 |
5.4299 | 10.0 | 7200 | 4.5637 | 0.7413 |
5.0017 | 11.0 | 7920 | 4.2379 | 0.7585 |
4.5951 | 12.0 | 8640 | 3.9699 | 0.7678 |
4.2849 | 13.0 | 9360 | 3.7416 | 0.7737 |
3.991 | 14.0 | 10080 | 3.5502 | 0.7865 |
3.7455 | 15.0 | 10800 | 3.4090 | 0.7900 |
3.5315 | 16.0 | 11520 | 3.3053 | 0.7914 |
3.345 | 17.0 | 12240 | 3.1670 | 0.8003 |
3.1767 | 18.0 | 12960 | 3.0739 | 0.8013 |
3.0322 | 19.0 | 13680 | 2.9927 | 0.8047 |
2.8864 | 20.0 | 14400 | 2.9366 | 0.8037 |
2.7728 | 21.0 | 15120 | 2.8666 | 0.8091 |
2.6732 | 22.0 | 15840 | 2.8146 | 0.8126 |
2.5726 | 23.0 | 16560 | 2.7588 | 0.8195 |
2.493 | 24.0 | 17280 | 2.7319 | 0.8273 |
2.4183 | 25.0 | 18000 | 2.6847 | 0.8249 |
2.3526 | 26.0 | 18720 | 2.6317 | 0.8323 |
2.2709 | 27.0 | 19440 | 2.6071 | 0.8288 |
2.2125 | 28.0 | 20160 | 2.5982 | 0.8323 |
2.1556 | 29.0 | 20880 | 2.5546 | 0.8337 |
2.1042 | 30.0 | 21600 | 2.5278 | 0.8318 |
2.054 | 31.0 | 22320 | 2.5005 | 0.8411 |
2.0154 | 32.0 | 23040 | 2.4891 | 0.8347 |
1.9785 | 33.0 | 23760 | 2.4633 | 0.8367 |
1.9521 | 34.0 | 24480 | 2.4451 | 0.8421 |
1.9247 | 35.0 | 25200 | 2.4370 | 0.8416 |
1.8741 | 36.0 | 25920 | 2.4197 | 0.8446 |
1.8659 | 37.0 | 26640 | 2.4081 | 0.8406 |
1.8367 | 38.0 | 27360 | 2.3979 | 0.8426 |
1.8153 | 39.0 | 28080 | 2.3758 | 0.8451 |
1.7641 | 40.0 | 28800 | 2.3729 | 0.8470 |
1.7608 | 41.0 | 29520 | 2.3683 | 0.8460 |
1.7647 | 42.0 | 30240 | 2.3628 | 0.8446 |
1.7656 | 43.0 | 30960 | 2.3492 | 0.8470 |
Framework versions
- Transformers 4.22.1
- Pytorch 1.12.1+cu113
- Datasets 2.5.1
- Tokenizers 0.12.1