<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
bert-dp-second
This model is a fine-tuned version of on the generator dataset. It achieves the following results on the evaluation set:
- Loss: 3.2321
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 19
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
7.3416 | 0.23 | 500 | 6.6532 |
6.5752 | 0.47 | 1000 | 6.5275 |
6.4866 | 0.7 | 1500 | 6.4720 |
6.4273 | 0.93 | 2000 | 6.4540 |
6.4036 | 1.17 | 2500 | 6.4236 |
6.3779 | 1.4 | 3000 | 6.4018 |
6.3528 | 1.63 | 3500 | 6.3768 |
6.3258 | 1.87 | 4000 | 6.3679 |
6.3009 | 2.1 | 4500 | 6.3305 |
6.2646 | 2.33 | 5000 | 6.3142 |
6.2583 | 2.57 | 5500 | 6.3004 |
6.2223 | 2.8 | 6000 | 6.2605 |
6.1941 | 3.03 | 6500 | 6.2353 |
6.1382 | 3.27 | 7000 | 6.2095 |
6.1301 | 3.5 | 7500 | 6.1774 |
6.09 | 3.73 | 8000 | 6.1480 |
6.0624 | 3.97 | 8500 | 6.1061 |
6.0056 | 4.2 | 9000 | 6.0655 |
5.9444 | 4.43 | 9500 | 5.9461 |
5.7101 | 4.67 | 10000 | 5.2594 |
5.005 | 4.9 | 10500 | 4.7348 |
4.6127 | 5.13 | 11000 | 4.4626 |
4.3907 | 5.37 | 11500 | 4.2862 |
4.241 | 5.6 | 12000 | 4.1701 |
4.1286 | 5.83 | 12500 | 4.0673 |
4.0151 | 6.07 | 13000 | 3.9967 |
3.934 | 6.3 | 13500 | 3.9292 |
3.8789 | 6.53 | 14000 | 3.8707 |
3.8231 | 6.77 | 14500 | 3.8222 |
3.7696 | 7.0 | 15000 | 3.7800 |
3.7078 | 7.23 | 15500 | 3.7424 |
3.6671 | 7.47 | 16000 | 3.7093 |
3.6446 | 7.7 | 16500 | 3.6780 |
3.6069 | 7.93 | 17000 | 3.6476 |
3.5782 | 8.17 | 17500 | 3.6283 |
3.5384 | 8.4 | 18000 | 3.6098 |
3.5245 | 8.63 | 18500 | 3.5942 |
3.5209 | 8.87 | 19000 | 3.5841 |
3.4948 | 9.1 | 19500 | 3.5728 |
3.4877 | 9.33 | 20000 | 3.5692 |
3.4818 | 9.57 | 20500 | 3.5641 |
3.4844 | 9.8 | 21000 | 3.5640 |
3.5323 | 10.03 | 21500 | 3.6026 |
3.5123 | 10.27 | 22000 | 3.5877 |
3.5046 | 10.5 | 22500 | 3.5595 |
3.4787 | 10.73 | 23000 | 3.5403 |
3.4568 | 10.97 | 23500 | 3.5125 |
3.4154 | 11.2 | 24000 | 3.4916 |
3.3998 | 11.43 | 24500 | 3.4749 |
3.3986 | 11.67 | 25000 | 3.4578 |
3.372 | 11.9 | 25500 | 3.4405 |
3.3402 | 12.13 | 26000 | 3.4317 |
3.3281 | 12.37 | 26500 | 3.4215 |
3.322 | 12.6 | 27000 | 3.4093 |
3.3198 | 12.83 | 27500 | 3.4026 |
3.3039 | 13.07 | 28000 | 3.3971 |
3.296 | 13.3 | 28500 | 3.3954 |
3.3015 | 13.53 | 29000 | 3.3954 |
3.2939 | 13.77 | 29500 | 3.3927 |
3.3013 | 14.0 | 30000 | 3.3918 |
3.343 | 14.23 | 30500 | 3.4265 |
3.3438 | 14.47 | 31000 | 3.4133 |
3.3397 | 14.7 | 31500 | 3.3951 |
3.3156 | 14.93 | 32000 | 3.3681 |
3.2815 | 15.17 | 32500 | 3.3503 |
3.2654 | 15.4 | 33000 | 3.3313 |
3.2492 | 15.63 | 33500 | 3.3184 |
3.2399 | 15.87 | 34000 | 3.2995 |
3.2222 | 16.1 | 34500 | 3.2922 |
3.2026 | 16.33 | 35000 | 3.2818 |
3.191 | 16.57 | 35500 | 3.2723 |
3.1825 | 16.8 | 36000 | 3.2640 |
3.1691 | 17.03 | 36500 | 3.2530 |
3.1656 | 17.27 | 37000 | 3.2487 |
3.1487 | 17.5 | 37500 | 3.2419 |
3.1635 | 17.73 | 38000 | 3.2411 |
3.1675 | 17.97 | 38500 | 3.2330 |
3.1422 | 18.2 | 39000 | 3.2344 |
3.1443 | 18.43 | 39500 | 3.2331 |
3.1425 | 18.67 | 40000 | 3.2348 |
3.139 | 18.9 | 40500 | 3.2321 |
Framework versions
- Transformers 4.26.1
- Pytorch 1.11.0+cu113
- Datasets 2.13.0
- Tokenizers 0.13.3