generated_from_trainer

<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->

tiny-mlm-wikitext-custom-tokenizer

This model is a fine-tuned version of google/bert_uncased_L-2_H-128_A-2 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Training results

Training Loss Epoch Step Validation Loss
8.1543 0.4 500 7.6501
7.4342 0.8 1000 7.5531
7.3656 1.2 1500 nan
7.2844 1.6 2000 7.4543
7.2621 2.0 2500 7.4480
7.1668 2.4 3000 7.3456
7.1874 2.8 3500 7.3750
7.1284 3.2 4000 nan
7.1041 3.6 4500 7.2361
7.0693 4.0 5000 7.2836
7.0604 4.4 5500 7.2521
6.993 4.8 6000 7.2082
7.0014 5.2 6500 7.1960
6.9607 5.6 7000 7.1971
6.9514 6.0 7500 nan
6.9524 6.4 8000 7.0977
6.8999 6.8 8500 7.0787
6.8471 7.2 9000 7.1168
6.8511 7.6 9500 7.0589
6.8111 8.0 10000 7.0058
6.8131 8.4 10500 7.0089
6.717 8.8 11000 6.9681
6.7024 9.2 11500 6.9542
6.7567 9.6 12000 6.9008
6.7025 10.0 12500 6.8863
6.6509 10.4 13000 6.8794
6.6151 10.8 13500 6.8888
6.6348 11.2 14000 6.8106
6.6061 11.6 14500 6.8399
6.5637 12.0 15000 6.8289
6.5526 12.4 15500 6.7866
6.4899 12.8 16000 6.7108
6.5106 13.2 16500 6.7707
6.5022 13.6 17000 6.7289
6.429 14.0 17500 6.6883
6.4342 14.4 18000 6.6669
6.4385 14.8 18500 6.6722
6.4328 15.2 19000 6.6867
6.3802 15.6 19500 6.6403
6.375 16.0 20000 6.6141
6.332 16.4 20500 6.6759
6.3237 16.8 21000 6.5960
6.3551 17.2 21500 6.5551
6.2918 17.6 22000 nan
6.3 18.0 22500 6.5744
6.2555 18.4 23000 6.5212
6.2569 18.8 23500 6.5515
6.2658 19.2 24000 6.5763
6.2205 19.6 24500 6.4887
6.2022 20.0 25000 6.4955
6.1881 20.4 25500 6.4849
6.1479 20.8 26000 6.4727
6.1805 21.2 26500 6.4253
6.1439 21.6 27000 6.4397
6.1332 22.0 27500 6.4876
6.1379 22.4 28000 6.4940

Framework versions