generated_from_trainer

<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->

distilbert_sa_GLUE_Experiment_logit_kd_data_aug_sst2_256

This model is a fine-tuned version of distilbert-base-uncased on the GLUE SST2 dataset. It achieves the following results on the evaluation set:

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.5009 1.0 4374 0.6370 0.8165
0.3329 2.0 8748 0.6643 0.8257
0.2804 3.0 13122 0.6192 0.8326
0.249 4.0 17496 0.6205 0.8372
0.2279 5.0 21870 0.6250 0.8349
0.2122 6.0 26244 0.6644 0.8280
0.2008 7.0 30618 0.5707 0.8440
0.1918 8.0 34992 0.5863 0.8360
0.1847 9.0 39366 0.5779 0.8394
0.1784 10.0 43740 0.5662 0.8349
0.1734 11.0 48114 0.5619 0.8394
0.169 12.0 52488 0.5583 0.8406
0.1653 13.0 56862 0.5830 0.8303
0.1619 14.0 61236 0.5773 0.8372
0.1591 15.0 65610 0.5728 0.8291
0.1564 16.0 69984 0.5631 0.8383
0.154 17.0 74358 0.5628 0.8452

Framework versions