<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
Mistral-7b-instruct-cairo-test
This model is a fine-tuned version of TheBloke/Mistral-7B-Instruct-v0.1-GPTQ on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.0278
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
4.1068 | 0.02 | 5 | 4.8573 |
5.5606 | 0.04 | 10 | 4.8367 |
3.4196 | 0.07 | 15 | 4.7934 |
4.2192 | 0.09 | 20 | 4.7186 |
3.3524 | 0.11 | 25 | 4.6344 |
3.0857 | 0.13 | 30 | 4.4890 |
3.6776 | 0.15 | 35 | 4.3401 |
4.19 | 0.17 | 40 | 4.1498 |
3.5773 | 0.2 | 45 | 3.9098 |
2.9436 | 0.22 | 50 | 3.6733 |
3.4207 | 0.24 | 55 | 3.4782 |
3.2692 | 0.26 | 60 | 3.3105 |
3.0433 | 0.28 | 65 | 3.2106 |
2.8148 | 0.3 | 70 | 3.1343 |
2.6395 | 0.33 | 75 | 3.0389 |
3.06 | 0.35 | 80 | 2.9654 |
2.4162 | 0.37 | 85 | 2.9053 |
2.2591 | 0.39 | 90 | 2.8445 |
2.7657 | 0.41 | 95 | 2.8426 |
2.4807 | 0.43 | 100 | 2.8869 |
2.7076 | 0.46 | 105 | 2.7226 |
2.4631 | 0.48 | 110 | 2.6910 |
2.1856 | 0.5 | 115 | 2.6262 |
2.299 | 0.52 | 120 | 2.5719 |
2.238 | 0.54 | 125 | 2.5251 |
2.1931 | 0.57 | 130 | 2.4816 |
2.2491 | 0.59 | 135 | 2.4329 |
2.1201 | 0.61 | 140 | 2.4006 |
2.1363 | 0.63 | 145 | 2.3743 |
2.2474 | 0.65 | 150 | 2.3490 |
2.1698 | 0.67 | 155 | 2.3232 |
1.993 | 0.7 | 160 | 2.2997 |
1.8608 | 0.72 | 165 | 2.2688 |
1.6875 | 0.74 | 170 | 2.2301 |
2.364 | 0.76 | 175 | 2.1948 |
1.6522 | 0.78 | 180 | 2.1586 |
1.9388 | 0.8 | 185 | 2.1308 |
1.7932 | 0.83 | 190 | 2.1068 |
1.9264 | 0.85 | 195 | 2.0911 |
2.0083 | 0.87 | 200 | 2.0708 |
1.729 | 0.89 | 205 | 2.0518 |
1.7525 | 0.91 | 210 | 2.0405 |
1.7576 | 0.93 | 215 | 2.0308 |
1.7018 | 0.96 | 220 | 2.0211 |
1.7508 | 0.98 | 225 | 2.0107 |
1.5914 | 1.0 | 230 | 2.0008 |
1.3207 | 1.02 | 235 | 1.9894 |
1.9048 | 1.04 | 240 | 1.9811 |
1.7579 | 1.07 | 245 | 1.9702 |
1.5786 | 1.09 | 250 | 1.9581 |
1.6914 | 1.11 | 255 | 1.9497 |
1.594 | 1.13 | 260 | 1.9439 |
1.7444 | 1.15 | 265 | 1.9351 |
1.4736 | 1.17 | 270 | 1.9247 |
1.5461 | 1.2 | 275 | 1.9192 |
1.4612 | 1.22 | 280 | 1.9096 |
1.7891 | 1.24 | 285 | 1.8967 |
1.5393 | 1.26 | 290 | 1.8864 |
1.447 | 1.28 | 295 | 1.8716 |
1.6022 | 1.3 | 300 | 1.8637 |
1.5392 | 1.33 | 305 | 1.8599 |
1.5982 | 1.35 | 310 | 1.8628 |
1.5682 | 1.37 | 315 | 1.8628 |
1.7096 | 1.39 | 320 | 1.8699 |
1.4207 | 1.41 | 325 | 1.8718 |
1.5614 | 1.43 | 330 | 1.8672 |
1.5886 | 1.46 | 335 | 1.8609 |
1.5616 | 1.48 | 340 | 1.8572 |
1.4709 | 1.5 | 345 | 1.8547 |
1.3505 | 1.52 | 350 | 1.8490 |
1.206 | 1.54 | 355 | 1.8501 |
1.4762 | 1.57 | 360 | 1.8540 |
1.4242 | 1.59 | 365 | 1.8516 |
1.2096 | 1.61 | 370 | 1.8509 |
1.7648 | 1.63 | 375 | 1.8504 |
1.5289 | 1.65 | 380 | 1.8476 |
1.4167 | 1.67 | 385 | 1.8465 |
1.4828 | 1.7 | 390 | 1.8478 |
1.4851 | 1.72 | 395 | 1.8437 |
1.2062 | 1.74 | 400 | 1.8484 |
1.4377 | 1.76 | 405 | 1.8509 |
1.3961 | 1.78 | 410 | 1.8328 |
1.6045 | 1.8 | 415 | 1.8213 |
1.4489 | 1.83 | 420 | 1.8172 |
1.2413 | 1.85 | 425 | 1.8148 |
1.196 | 1.87 | 430 | 1.8231 |
1.2091 | 1.89 | 435 | 1.8297 |
1.547 | 1.91 | 440 | 1.8285 |
1.2629 | 1.93 | 445 | 1.8258 |
1.3887 | 1.96 | 450 | 1.8244 |
1.4621 | 1.98 | 455 | 1.8254 |
1.2073 | 2.0 | 460 | 1.8347 |
1.2413 | 2.02 | 465 | 1.8464 |
1.3203 | 2.04 | 470 | 1.8559 |
1.4398 | 2.07 | 475 | 1.8681 |
1.4777 | 2.09 | 480 | 1.8771 |
1.631 | 2.11 | 485 | 1.8853 |
1.2907 | 2.13 | 490 | 1.8840 |
1.301 | 2.15 | 495 | 1.8892 |
1.2153 | 2.17 | 500 | 1.8905 |
1.3443 | 2.2 | 505 | 1.8964 |
1.0271 | 2.22 | 510 | 1.9051 |
1.0068 | 2.24 | 515 | 1.9053 |
1.4363 | 2.26 | 520 | 1.8983 |
1.209 | 2.28 | 525 | 1.8944 |
1.6277 | 2.3 | 530 | 1.8923 |
1.5597 | 2.33 | 535 | 1.9001 |
1.1923 | 2.35 | 540 | 1.9033 |
1.6841 | 2.37 | 545 | 1.9061 |
1.3195 | 2.39 | 550 | 1.9120 |
1.0681 | 2.41 | 555 | 1.9089 |
1.258 | 2.43 | 560 | 1.9071 |
1.5104 | 2.46 | 565 | 1.9040 |
1.2897 | 2.48 | 570 | 1.8998 |
1.2091 | 2.5 | 575 | 1.9029 |
1.5697 | 2.52 | 580 | 1.9136 |
1.4122 | 2.54 | 585 | 1.9211 |
1.5525 | 2.57 | 590 | 1.9347 |
1.5863 | 2.59 | 595 | 1.9409 |
1.1388 | 2.61 | 600 | 1.9412 |
1.5925 | 2.63 | 605 | 1.9387 |
1.3044 | 2.65 | 610 | 1.9305 |
1.5449 | 2.67 | 615 | 1.9200 |
1.4484 | 2.7 | 620 | 1.9107 |
1.0465 | 2.72 | 625 | 1.9113 |
1.289 | 2.74 | 630 | 1.9154 |
1.3062 | 2.76 | 635 | 1.9179 |
1.2386 | 2.78 | 640 | 1.9185 |
1.1252 | 2.8 | 645 | 1.9226 |
1.4668 | 2.83 | 650 | 1.9298 |
1.2884 | 2.85 | 655 | 1.9354 |
1.9844 | 2.87 | 660 | 1.9375 |
1.3249 | 2.89 | 665 | 1.9351 |
1.1988 | 2.91 | 670 | 1.9327 |
1.4077 | 2.93 | 675 | 1.9337 |
1.6381 | 2.96 | 680 | 1.9306 |
1.3576 | 2.98 | 685 | 1.9288 |
1.6657 | 3.0 | 690 | 1.9273 |
1.4076 | 3.02 | 695 | 1.9280 |
1.4741 | 3.04 | 700 | 1.9321 |
1.0568 | 3.07 | 705 | 1.9409 |
0.9974 | 3.09 | 710 | 1.9655 |
1.2265 | 3.11 | 715 | 1.9857 |
1.2234 | 3.13 | 720 | 1.9969 |
1.3731 | 3.15 | 725 | 1.9986 |
1.454 | 3.17 | 730 | 1.9975 |
1.452 | 3.2 | 735 | 1.9944 |
1.5325 | 3.22 | 740 | 1.9942 |
1.2146 | 3.24 | 745 | 1.9970 |
1.2917 | 3.26 | 750 | 2.0036 |
1.637 | 3.28 | 755 | 2.0084 |
1.1394 | 3.3 | 760 | 2.0081 |
1.5283 | 3.33 | 765 | 2.0091 |
1.553 | 3.35 | 770 | 2.0052 |
1.506 | 3.37 | 775 | 2.0000 |
1.6071 | 3.39 | 780 | 1.9977 |
1.5568 | 3.41 | 785 | 1.9949 |
1.5396 | 3.43 | 790 | 1.9928 |
1.3375 | 3.46 | 795 | 1.9960 |
1.4347 | 3.48 | 800 | 2.0011 |
1.3657 | 3.5 | 805 | 2.0071 |
1.1956 | 3.52 | 810 | 2.0122 |
2.0938 | 3.54 | 815 | 2.0138 |
1.1887 | 3.57 | 820 | 2.0126 |
1.226 | 3.59 | 825 | 2.0157 |
1.2971 | 3.61 | 830 | 2.0195 |
1.786 | 3.63 | 835 | 2.0240 |
1.2049 | 3.65 | 840 | 2.0266 |
1.4812 | 3.67 | 845 | 2.0277 |
1.3934 | 3.7 | 850 | 2.0268 |
1.6024 | 3.72 | 855 | 2.0244 |
1.2001 | 3.74 | 860 | 2.0224 |
1.8469 | 3.76 | 865 | 2.0217 |
0.7405 | 3.78 | 870 | 2.0208 |
1.3304 | 3.8 | 875 | 2.0218 |
1.0719 | 3.83 | 880 | 2.0229 |
1.2068 | 3.85 | 885 | 2.0248 |
1.6534 | 3.87 | 890 | 2.0275 |
1.5383 | 3.89 | 895 | 2.0286 |
1.6599 | 3.91 | 900 | 2.0292 |
1.2826 | 3.93 | 905 | 2.0291 |
1.1554 | 3.96 | 910 | 2.0289 |
1.29 | 3.98 | 915 | 2.0289 |
1.253 | 4.0 | 920 | 2.0286 |
1.4227 | 4.02 | 925 | 2.0283 |
1.7368 | 4.04 | 930 | 2.0278 |
1.6137 | 4.07 | 935 | 2.0273 |
1.03 | 4.09 | 940 | 2.0271 |
1.5915 | 4.11 | 945 | 2.0277 |
1.1601 | 4.13 | 950 | 2.0275 |
1.1825 | 4.15 | 955 | 2.0278 |
1.3415 | 4.17 | 960 | 2.0279 |
1.2622 | 4.2 | 965 | 2.0279 |
1.6122 | 4.22 | 970 | 2.0278 |
1.3889 | 4.24 | 975 | 2.0279 |
1.3606 | 4.26 | 980 | 2.0281 |
1.592 | 4.28 | 985 | 2.0280 |
1.0692 | 4.3 | 990 | 2.0281 |
1.1982 | 4.33 | 995 | 2.0279 |
1.5263 | 4.35 | 1000 | 2.0278 |
Framework versions
- Transformers 4.35.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.14.5
- Tokenizers 0.14.1