<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
from_scratch
This model is a fine-tuned version of tokenizer/config.json on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4744
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 360
- eval_batch_size: 360
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 10000
- num_epochs: 3.0
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 1.0952 | 0.05 | 20000 | 1.0383 |
| 0.936 | 0.1 | 40000 | 0.8852 |
| 0.8679 | 0.14 | 60000 | 0.8207 |
| 0.8276 | 0.19 | 80000 | 0.7796 |
| 0.796 | 0.24 | 100000 | 0.7519 |
| 0.7756 | 0.29 | 120000 | 0.7299 |
| 0.7545 | 0.33 | 140000 | 0.7103 |
| 0.7395 | 0.38 | 160000 | 0.6947 |
| 0.7236 | 0.43 | 180000 | 0.6809 |
| 0.7143 | 0.48 | 200000 | 0.6705 |
| 0.705 | 0.52 | 220000 | 0.6585 |
| 0.6904 | 0.57 | 240000 | 0.6479 |
| 0.6835 | 0.62 | 260000 | 0.6388 |
| 0.672 | 0.67 | 280000 | 0.6290 |
| 0.665 | 0.72 | 300000 | 0.6217 |
| 0.6581 | 0.76 | 320000 | 0.6136 |
| 0.6466 | 0.81 | 340000 | 0.6071 |
| 0.6396 | 0.86 | 360000 | 0.6000 |
| 0.6343 | 0.91 | 380000 | 0.5940 |
| 0.6286 | 0.95 | 400000 | 0.5880 |
| 0.6183 | 1.0 | 420000 | 0.5809 |
| 0.6134 | 1.05 | 440000 | 0.5757 |
| 0.6094 | 1.1 | 460000 | 0.5693 |
| 0.6032 | 1.15 | 480000 | 0.5641 |
| 0.5954 | 1.19 | 500000 | 0.5596 |
| 0.5915 | 1.24 | 520000 | 0.5532 |
| 0.5845 | 1.29 | 540000 | 0.5489 |
| 0.5823 | 1.34 | 560000 | 0.5437 |
| 0.5754 | 1.38 | 580000 | 0.5393 |
| 0.573 | 1.43 | 600000 | 0.5345 |
| 0.5643 | 1.48 | 620000 | 0.5309 |
| 0.5627 | 1.53 | 640000 | 0.5262 |
| 0.56 | 1.57 | 660000 | 0.5220 |
| 0.5554 | 1.62 | 680000 | 0.5186 |
| 0.5507 | 1.67 | 700000 | 0.5152 |
| 0.5494 | 1.72 | 720000 | 0.5117 |
| 0.5445 | 1.77 | 740000 | 0.5076 |
| 0.5396 | 1.81 | 760000 | 0.5051 |
| 0.5363 | 1.86 | 780000 | 0.5026 |
| 0.5356 | 1.91 | 800000 | 0.4998 |
| 0.5303 | 1.96 | 820000 | 0.4982 |
| 0.5583 | 2.0 | 840000 | 0.5195 |
| 0.5565 | 2.05 | 860000 | 0.5180 |
| 0.5535 | 2.1 | 880000 | 0.5158 |
| 0.5497 | 2.15 | 900000 | 0.5133 |
| 0.5511 | 2.19 | 920000 | 0.5110 |
| 0.5439 | 2.24 | 940000 | 0.5085 |
| 0.5413 | 2.29 | 960000 | 0.5060 |
| 0.5376 | 2.34 | 980000 | 0.5023 |
| 0.5333 | 2.39 | 1000000 | 0.5004 |
| 0.5322 | 2.43 | 1020000 | 0.4973 |
| 0.5312 | 2.48 | 1040000 | 0.4941 |
| 0.5281 | 2.53 | 1060000 | 0.4921 |
| 0.5267 | 2.58 | 1080000 | 0.4902 |
| 0.5257 | 2.62 | 1100000 | 0.4871 |
| 0.5174 | 2.67 | 1120000 | 0.4849 |
| 0.5183 | 2.72 | 1140000 | 0.4825 |
| 0.5181 | 2.77 | 1160000 | 0.4807 |
| 0.5116 | 2.81 | 1180000 | 0.4784 |
| 0.5092 | 2.86 | 1200000 | 0.4769 |
| 0.5109 | 2.91 | 1220000 | 0.4757 |
| 0.5102 | 2.96 | 1240000 | 0.4739 |
Framework versions
- Transformers 4.25.1
- Pytorch 1.13.1+cu117
- Datasets 2.8.0
- Tokenizers 0.13.2