<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
Mistral-7b-instruct-cairo
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.3126
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
4.1833 | 0.02 | 5 | 4.6476 |
4.9228 | 0.04 | 10 | 4.6308 |
3.5035 | 0.07 | 15 | 4.5929 |
3.8733 | 0.09 | 20 | 4.5101 |
3.3104 | 0.11 | 25 | 4.3584 |
3.0202 | 0.13 | 30 | 4.1646 |
3.511 | 0.15 | 35 | 3.9564 |
3.8319 | 0.17 | 40 | 3.7167 |
3.5774 | 0.2 | 45 | 3.4875 |
2.6644 | 0.22 | 50 | 3.3009 |
3.6052 | 0.24 | 55 | 3.1509 |
3.2178 | 0.26 | 60 | 3.0054 |
2.7517 | 0.28 | 65 | 2.8677 |
2.642 | 0.3 | 70 | 2.7322 |
2.475 | 0.33 | 75 | 2.5958 |
2.6204 | 0.35 | 80 | 2.4875 |
2.0662 | 0.37 | 85 | 2.3804 |
2.0227 | 0.39 | 90 | 2.2954 |
2.6723 | 0.41 | 95 | 2.2301 |
2.0823 | 0.43 | 100 | 2.1851 |
2.3267 | 0.46 | 105 | 2.1553 |
2.2654 | 0.48 | 110 | 2.1372 |
2.0143 | 0.5 | 115 | 2.1293 |
2.0962 | 0.52 | 120 | 2.1173 |
1.9569 | 0.54 | 125 | 2.0962 |
2.0078 | 0.57 | 130 | 2.0627 |
2.0684 | 0.59 | 135 | 2.0403 |
1.7525 | 0.61 | 140 | 2.0107 |
1.8408 | 0.63 | 145 | 1.9661 |
1.9556 | 0.65 | 150 | 1.9292 |
2.1041 | 0.67 | 155 | 1.8963 |
1.6941 | 0.7 | 160 | 1.8716 |
1.6379 | 0.72 | 165 | 1.8356 |
1.5787 | 0.74 | 170 | 1.8169 |
2.2743 | 0.76 | 175 | 1.8057 |
1.5461 | 0.78 | 180 | 1.8020 |
1.7454 | 0.8 | 185 | 1.7944 |
1.6493 | 0.83 | 190 | 1.7856 |
1.6966 | 0.85 | 195 | 1.7802 |
2.0936 | 0.87 | 200 | 1.7591 |
1.7361 | 0.89 | 205 | 1.7474 |
1.4094 | 0.91 | 210 | 1.7470 |
1.5914 | 0.93 | 215 | 1.7498 |
1.6018 | 0.96 | 220 | 1.7379 |
1.5372 | 0.98 | 225 | 1.7201 |
1.4012 | 1.0 | 230 | 1.7062 |
1.1786 | 1.02 | 235 | 1.7075 |
1.6958 | 1.04 | 240 | 1.7153 |
1.5685 | 1.07 | 245 | 1.7119 |
1.3384 | 1.09 | 250 | 1.7053 |
1.4193 | 1.11 | 255 | 1.7025 |
1.1412 | 1.13 | 260 | 1.7064 |
1.6449 | 1.15 | 265 | 1.6887 |
1.4204 | 1.17 | 270 | 1.6711 |
1.2598 | 1.2 | 275 | 1.6524 |
1.2817 | 1.22 | 280 | 1.6267 |
1.7353 | 1.24 | 285 | 1.6130 |
1.1693 | 1.26 | 290 | 1.6097 |
1.1159 | 1.28 | 295 | 1.5932 |
1.527 | 1.3 | 300 | 1.5877 |
1.317 | 1.33 | 305 | 1.5897 |
1.6248 | 1.35 | 310 | 1.5864 |
1.5269 | 1.37 | 315 | 1.5848 |
1.4539 | 1.39 | 320 | 1.5846 |
1.2503 | 1.41 | 325 | 1.5859 |
1.1068 | 1.43 | 330 | 1.5861 |
1.4746 | 1.46 | 335 | 1.5830 |
1.5845 | 1.48 | 340 | 1.5669 |
1.3895 | 1.5 | 345 | 1.5571 |
1.2368 | 1.52 | 350 | 1.5408 |
0.9685 | 1.54 | 355 | 1.5221 |
1.1145 | 1.57 | 360 | 1.5088 |
1.0746 | 1.59 | 365 | 1.5093 |
0.898 | 1.61 | 370 | 1.5189 |
1.3473 | 1.63 | 375 | 1.5278 |
1.0516 | 1.65 | 380 | 1.5088 |
1.0326 | 1.67 | 385 | 1.4990 |
1.2609 | 1.7 | 390 | 1.4929 |
1.358 | 1.72 | 395 | 1.4872 |
0.7889 | 1.74 | 400 | 1.4869 |
1.2312 | 1.76 | 405 | 1.4790 |
1.105 | 1.78 | 410 | 1.4585 |
1.1962 | 1.8 | 415 | 1.4438 |
0.8742 | 1.83 | 420 | 1.4320 |
1.0508 | 1.85 | 425 | 1.4223 |
1.0146 | 1.87 | 430 | 1.4137 |
1.2185 | 1.89 | 435 | 1.4117 |
1.3546 | 1.91 | 440 | 1.4142 |
0.941 | 1.93 | 445 | 1.4218 |
1.0273 | 1.96 | 450 | 1.4278 |
1.1569 | 1.98 | 455 | 1.4298 |
1.1368 | 2.0 | 460 | 1.4288 |
1.2212 | 2.02 | 465 | 1.4248 |
0.9665 | 2.04 | 470 | 1.4167 |
1.0306 | 2.07 | 475 | 1.4120 |
0.9129 | 2.09 | 480 | 1.4082 |
0.977 | 2.11 | 485 | 1.4077 |
0.7094 | 2.13 | 490 | 1.3975 |
0.8368 | 2.15 | 495 | 1.3909 |
1.0189 | 2.17 | 500 | 1.3909 |
0.777 | 2.2 | 505 | 1.3960 |
0.5675 | 2.22 | 510 | 1.3993 |
0.5212 | 2.24 | 515 | 1.3981 |
0.8457 | 2.26 | 520 | 1.3968 |
0.7483 | 2.28 | 525 | 1.3934 |
1.006 | 2.3 | 530 | 1.3869 |
1.0734 | 2.33 | 535 | 1.3816 |
0.9135 | 2.35 | 540 | 1.3824 |
0.9682 | 2.37 | 545 | 1.3843 |
0.6738 | 2.39 | 550 | 1.3880 |
0.7082 | 2.41 | 555 | 1.3936 |
0.8458 | 2.43 | 560 | 1.3950 |
0.9601 | 2.46 | 565 | 1.3953 |
1.0889 | 2.48 | 570 | 1.3940 |
0.601 | 2.5 | 575 | 1.3890 |
1.1641 | 2.52 | 580 | 1.3888 |
0.7028 | 2.54 | 585 | 1.3892 |
1.1143 | 2.57 | 590 | 1.3926 |
0.9926 | 2.59 | 595 | 1.3973 |
0.74 | 2.61 | 600 | 1.4006 |
0.9376 | 2.63 | 605 | 1.3999 |
0.5497 | 2.65 | 610 | 1.4008 |
1.1378 | 2.67 | 615 | 1.3938 |
0.8681 | 2.7 | 620 | 1.3734 |
0.7371 | 2.72 | 625 | 1.3602 |
0.7555 | 2.74 | 630 | 1.3548 |
0.8072 | 2.76 | 635 | 1.3504 |
0.9151 | 2.78 | 640 | 1.3492 |
0.6883 | 2.8 | 645 | 1.3531 |
1.0354 | 2.83 | 650 | 1.3518 |
0.8549 | 2.85 | 655 | 1.3447 |
1.4976 | 2.87 | 660 | 1.3398 |
0.8543 | 2.89 | 665 | 1.3337 |
0.6089 | 2.91 | 670 | 1.3279 |
0.7977 | 2.93 | 675 | 1.3274 |
1.0637 | 2.96 | 680 | 1.3267 |
0.6324 | 2.98 | 685 | 1.3283 |
0.8692 | 3.0 | 690 | 1.3298 |
0.6911 | 3.02 | 695 | 1.3329 |
0.6922 | 3.04 | 700 | 1.3360 |
0.6846 | 3.07 | 705 | 1.3386 |
0.6577 | 3.09 | 710 | 1.3434 |
0.7437 | 3.11 | 715 | 1.3462 |
0.7787 | 3.13 | 720 | 1.3472 |
0.3633 | 3.15 | 725 | 1.3406 |
0.6708 | 3.17 | 730 | 1.3365 |
0.6567 | 3.2 | 735 | 1.3322 |
0.6527 | 3.22 | 740 | 1.3290 |
0.7375 | 3.24 | 745 | 1.3248 |
0.894 | 3.26 | 750 | 1.3205 |
0.7315 | 3.28 | 755 | 1.3171 |
0.4086 | 3.3 | 760 | 1.3160 |
0.7842 | 3.33 | 765 | 1.3184 |
0.8795 | 3.35 | 770 | 1.3211 |
0.5648 | 3.37 | 775 | 1.3240 |
0.7625 | 3.39 | 780 | 1.3276 |
0.7745 | 3.41 | 785 | 1.3294 |
0.63 | 3.43 | 790 | 1.3293 |
0.5566 | 3.46 | 795 | 1.3285 |
0.6879 | 3.48 | 800 | 1.3283 |
0.6085 | 3.5 | 805 | 1.3285 |
0.3852 | 3.52 | 810 | 1.3293 |
0.941 | 3.54 | 815 | 1.3286 |
0.7086 | 3.57 | 820 | 1.3274 |
0.6944 | 3.59 | 825 | 1.3259 |
0.5837 | 3.61 | 830 | 1.3250 |
0.7383 | 3.63 | 835 | 1.3245 |
0.686 | 3.65 | 840 | 1.3240 |
0.4981 | 3.67 | 845 | 1.3233 |
0.8415 | 3.7 | 850 | 1.3228 |
0.9302 | 3.72 | 855 | 1.3204 |
0.5089 | 3.74 | 860 | 1.3174 |
0.844 | 3.76 | 865 | 1.3152 |
0.6718 | 3.78 | 870 | 1.3142 |
0.5624 | 3.8 | 875 | 1.3140 |
0.602 | 3.83 | 880 | 1.3141 |
0.3964 | 3.85 | 885 | 1.3141 |
0.8808 | 3.87 | 890 | 1.3141 |
0.7408 | 3.89 | 895 | 1.3139 |
0.6845 | 3.91 | 900 | 1.3137 |
0.7014 | 3.93 | 905 | 1.3133 |
0.5047 | 3.96 | 910 | 1.3131 |
0.5913 | 3.98 | 915 | 1.3130 |
0.7902 | 4.0 | 920 | 1.3130 |
0.3153 | 4.02 | 925 | 1.3131 |
0.5539 | 4.04 | 930 | 1.3131 |
0.5549 | 4.07 | 935 | 1.3126 |
0.7167 | 4.09 | 940 | 1.3123 |
0.6587 | 4.11 | 945 | 1.3122 |
0.3435 | 4.13 | 950 | 1.3121 |
0.319 | 4.15 | 955 | 1.3122 |
0.5459 | 4.17 | 960 | 1.3123 |
0.3663 | 4.2 | 965 | 1.3124 |
0.9411 | 4.22 | 970 | 1.3125 |
0.6706 | 4.24 | 975 | 1.3126 |
0.9866 | 4.26 | 980 | 1.3126 |
0.9504 | 4.28 | 985 | 1.3126 |
0.6149 | 4.3 | 990 | 1.3126 |
0.6583 | 4.33 | 995 | 1.3126 |
0.4376 | 4.35 | 1000 | 1.3126 |
Framework versions
- Transformers 4.35.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.14.5
- Tokenizers 0.14.1