<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
llama7b_rulm_spm_unigram_10_mean_init_tie_rulm_small_1e_12_10_23
This model is a fine-tuned version of llama7b_rulm_spm_unigram_10_mean_init_tie_12_10_23 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.9526
- Accuracy: 0.4388
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 12
- eval_batch_size: 12
- seed: 42
- distributed_type: multi-GPU
- num_devices: 10
- gradient_accumulation_steps: 2
- total_train_batch_size: 240
- total_eval_batch_size: 120
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 200
- num_epochs: 1.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
3.5018 | 0.01 | 1000 | 3.5095 | 0.3768 |
3.3375 | 0.02 | 2000 | 3.3603 | 0.3918 |
3.2694 | 0.02 | 3000 | 3.2995 | 0.3982 |
3.236 | 0.03 | 4000 | 3.2626 | 0.4022 |
3.2131 | 0.04 | 5000 | 3.2350 | 0.4048 |
3.1934 | 0.05 | 6000 | 3.2147 | 0.4073 |
3.177 | 0.06 | 7000 | 3.2006 | 0.4089 |
3.1653 | 0.07 | 8000 | 3.1890 | 0.4097 |
3.1548 | 0.07 | 9000 | 3.1779 | 0.4109 |
3.152 | 0.08 | 10000 | 3.1699 | 0.4119 |
3.1416 | 0.09 | 11000 | 3.1622 | 0.4130 |
3.1387 | 0.1 | 12000 | 3.1561 | 0.4139 |
3.1192 | 0.11 | 13000 | 3.1495 | 0.4143 |
3.1221 | 0.12 | 14000 | 3.1431 | 0.4150 |
3.1136 | 0.12 | 15000 | 3.1391 | 0.4155 |
3.106 | 0.13 | 16000 | 3.1348 | 0.4160 |
3.1023 | 0.14 | 17000 | 3.1312 | 0.4165 |
3.1062 | 0.15 | 18000 | 3.1265 | 0.4172 |
3.1007 | 0.16 | 19000 | 3.1230 | 0.4177 |
3.0979 | 0.16 | 20000 | 3.1201 | 0.4178 |
3.0897 | 0.17 | 21000 | 3.1168 | 0.4179 |
3.0863 | 0.18 | 22000 | 3.1128 | 0.4189 |
3.0898 | 0.19 | 23000 | 3.1097 | 0.4191 |
3.0825 | 0.2 | 24000 | 3.1074 | 0.4191 |
3.0808 | 0.21 | 25000 | 3.1037 | 0.4200 |
3.0774 | 0.21 | 26000 | 3.1032 | 0.4197 |
3.0652 | 0.22 | 27000 | 3.0980 | 0.4202 |
3.0693 | 0.23 | 28000 | 3.0968 | 0.4207 |
3.0665 | 0.24 | 29000 | 3.0944 | 0.4209 |
3.0657 | 0.25 | 30000 | 3.0920 | 0.4210 |
3.0608 | 0.26 | 31000 | 3.0911 | 0.4213 |
3.0647 | 0.26 | 32000 | 3.0896 | 0.4213 |
3.0604 | 0.27 | 33000 | 3.0861 | 0.4217 |
3.0577 | 0.28 | 34000 | 3.0845 | 0.4221 |
3.0606 | 0.29 | 35000 | 3.0814 | 0.4220 |
3.0515 | 0.3 | 36000 | 3.0801 | 0.4227 |
3.0527 | 0.31 | 37000 | 3.0772 | 0.4225 |
3.0507 | 0.31 | 38000 | 3.0758 | 0.4228 |
3.0433 | 0.32 | 39000 | 3.0739 | 0.4234 |
3.0546 | 0.33 | 40000 | 3.0717 | 0.4234 |
3.0484 | 0.34 | 41000 | 3.0697 | 0.4236 |
3.0441 | 0.35 | 42000 | 3.0694 | 0.4236 |
3.0292 | 0.35 | 43000 | 3.0662 | 0.4242 |
3.0384 | 0.36 | 44000 | 3.0643 | 0.4244 |
3.0367 | 0.37 | 45000 | 3.0629 | 0.4240 |
3.0337 | 0.38 | 46000 | 3.0622 | 0.4246 |
3.0385 | 0.39 | 47000 | 3.0599 | 0.4245 |
3.0319 | 0.4 | 48000 | 3.0574 | 0.4250 |
3.0255 | 0.4 | 49000 | 3.0573 | 0.4249 |
3.021 | 0.41 | 50000 | 3.0557 | 0.4253 |
3.0305 | 0.42 | 51000 | 3.0530 | 0.4254 |
3.0248 | 0.43 | 52000 | 3.0528 | 0.4257 |
3.0269 | 0.44 | 53000 | 3.0495 | 0.4261 |
3.0136 | 0.45 | 54000 | 3.0488 | 0.4259 |
3.0156 | 0.45 | 55000 | 3.0468 | 0.4262 |
3.022 | 0.46 | 56000 | 3.0454 | 0.4268 |
3.0193 | 0.47 | 57000 | 3.0442 | 0.4269 |
3.0222 | 0.48 | 58000 | 3.0417 | 0.4270 |
3.0111 | 0.49 | 59000 | 3.0393 | 0.4276 |
3.0148 | 0.49 | 60000 | 3.0384 | 0.4273 |
3.0077 | 0.5 | 61000 | 3.0364 | 0.4276 |
3.0167 | 0.51 | 62000 | 3.0358 | 0.4276 |
3.0049 | 0.52 | 63000 | 3.0343 | 0.4280 |
3.016 | 0.53 | 64000 | 3.0322 | 0.4281 |
3.0103 | 0.54 | 65000 | 3.0297 | 0.4285 |
3.0066 | 0.54 | 66000 | 3.0290 | 0.4284 |
2.9958 | 0.55 | 67000 | 3.0281 | 0.4285 |
3.0062 | 0.56 | 68000 | 3.0266 | 0.4288 |
2.9985 | 0.57 | 69000 | 3.0245 | 0.4289 |
3.0031 | 0.58 | 70000 | 3.0224 | 0.4292 |
2.9894 | 0.59 | 71000 | 3.0214 | 0.4295 |
2.9929 | 0.59 | 72000 | 3.0193 | 0.4296 |
2.9904 | 0.6 | 73000 | 3.0176 | 0.4296 |
2.9989 | 0.61 | 74000 | 3.0171 | 0.4301 |
2.9959 | 0.62 | 75000 | 3.0153 | 0.4301 |
2.9847 | 0.63 | 76000 | 3.0142 | 0.4306 |
2.9892 | 0.63 | 77000 | 3.0127 | 0.4308 |
2.9924 | 0.64 | 78000 | 3.0110 | 0.4310 |
2.991 | 0.65 | 79000 | 3.0096 | 0.4312 |
2.9824 | 0.66 | 80000 | 3.0080 | 0.4311 |
2.9879 | 0.67 | 81000 | 3.0060 | 0.4315 |
2.9764 | 0.68 | 82000 | 3.0042 | 0.4321 |
2.9827 | 0.68 | 83000 | 3.0030 | 0.4315 |
2.9769 | 0.69 | 84000 | 3.0012 | 0.4324 |
2.9788 | 0.7 | 85000 | 3.0002 | 0.4322 |
2.9734 | 0.71 | 86000 | 2.9987 | 0.4325 |
2.9769 | 0.72 | 87000 | 2.9975 | 0.4328 |
2.9676 | 0.73 | 88000 | 2.9959 | 0.4326 |
2.9677 | 0.73 | 89000 | 2.9943 | 0.4330 |
2.9739 | 0.74 | 90000 | 2.9933 | 0.4330 |
2.9691 | 0.75 | 91000 | 2.9914 | 0.4334 |
2.969 | 0.76 | 92000 | 2.9901 | 0.4336 |
2.9602 | 0.77 | 93000 | 2.9889 | 0.4337 |
2.965 | 0.78 | 94000 | 2.9872 | 0.4339 |
2.9627 | 0.78 | 95000 | 2.9853 | 0.4341 |
2.9542 | 0.79 | 96000 | 2.9844 | 0.4340 |
2.9552 | 0.8 | 97000 | 2.9822 | 0.4344 |
2.9576 | 0.81 | 98000 | 2.9812 | 0.4347 |
2.9579 | 0.82 | 99000 | 2.9802 | 0.4348 |
2.9508 | 0.82 | 100000 | 2.9784 | 0.4349 |
2.9551 | 0.83 | 101000 | 2.9771 | 0.4353 |
2.9535 | 0.84 | 102000 | 2.9759 | 0.4357 |
2.9479 | 0.85 | 103000 | 2.9743 | 0.4357 |
2.9542 | 0.86 | 104000 | 2.9732 | 0.4359 |
2.9481 | 0.87 | 105000 | 2.9715 | 0.4360 |
2.941 | 0.87 | 106000 | 2.9697 | 0.4362 |
2.9435 | 0.88 | 107000 | 2.9684 | 0.4365 |
2.9403 | 0.89 | 108000 | 2.9674 | 0.4368 |
2.9453 | 0.9 | 109000 | 2.9661 | 0.4367 |
2.9396 | 0.91 | 110000 | 2.9644 | 0.4372 |
2.9375 | 0.92 | 111000 | 2.9633 | 0.4372 |
2.9284 | 0.92 | 112000 | 2.9621 | 0.4374 |
2.9418 | 0.93 | 113000 | 2.9606 | 0.4376 |
2.934 | 0.94 | 114000 | 2.9594 | 0.4377 |
2.9374 | 0.95 | 115000 | 2.9583 | 0.4380 |
2.9302 | 0.96 | 116000 | 2.9569 | 0.4382 |
2.9273 | 0.96 | 117000 | 2.9560 | 0.4382 |
2.9338 | 0.97 | 118000 | 2.9548 | 0.4384 |
2.9304 | 0.98 | 119000 | 2.9539 | 0.4385 |
2.9361 | 0.99 | 120000 | 2.9531 | 0.4385 |
2.927 | 1.0 | 121000 | 2.9526 | 0.4387 |
Framework versions
- Transformers 4.34.0
- Pytorch 2.0.1+cu118
- Datasets 2.14.5
- Tokenizers 0.14.1