<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
Mistral-7b-instruct-cairo-PEFT
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.4019
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
4.9419 | 0.02 | 5 | 4.6483 |
3.9783 | 0.04 | 10 | 4.6272 |
4.435 | 0.06 | 15 | 4.5826 |
4.2584 | 0.08 | 20 | 4.4936 |
4.3762 | 0.1 | 25 | 4.3248 |
3.9439 | 0.12 | 30 | 4.1278 |
3.5457 | 0.14 | 35 | 3.8557 |
3.9832 | 0.16 | 40 | 3.6140 |
3.8853 | 0.18 | 45 | 3.4018 |
2.9133 | 0.2 | 50 | 3.2528 |
2.7053 | 0.22 | 55 | 3.1320 |
2.2822 | 0.24 | 60 | 2.9988 |
3.3321 | 0.27 | 65 | 2.8657 |
2.6849 | 0.29 | 70 | 2.7583 |
2.1692 | 0.31 | 75 | 2.7543 |
2.7414 | 0.33 | 80 | 2.7597 |
2.2613 | 0.35 | 85 | 2.6587 |
2.4811 | 0.37 | 90 | 2.5609 |
2.0377 | 0.39 | 95 | 2.5793 |
1.945 | 0.41 | 100 | 2.6101 |
2.7281 | 0.43 | 105 | 2.2495 |
2.2824 | 0.45 | 110 | 2.1938 |
2.1301 | 0.47 | 115 | 2.1810 |
2.1599 | 0.49 | 120 | 2.1664 |
2.4669 | 0.51 | 125 | 2.1565 |
1.9863 | 0.53 | 130 | 2.1339 |
2.133 | 0.55 | 135 | 2.0948 |
1.8624 | 0.57 | 140 | 2.0498 |
2.0716 | 0.59 | 145 | 1.9950 |
1.7386 | 0.61 | 150 | 1.9598 |
1.52 | 0.63 | 155 | 1.9358 |
1.3844 | 0.65 | 160 | 1.9232 |
2.0719 | 0.67 | 165 | 1.9008 |
1.6826 | 0.69 | 170 | 1.8628 |
1.9418 | 0.71 | 175 | 1.8324 |
1.5165 | 0.73 | 180 | 1.7976 |
1.5764 | 0.76 | 185 | 1.7771 |
1.7126 | 0.78 | 190 | 1.7570 |
1.5553 | 0.8 | 195 | 1.7477 |
1.4325 | 0.82 | 200 | 1.7389 |
1.7383 | 0.84 | 205 | 1.7284 |
1.4096 | 0.86 | 210 | 1.7192 |
1.3947 | 0.88 | 215 | 1.7159 |
1.4394 | 0.9 | 220 | 1.7090 |
1.5481 | 0.92 | 225 | 1.7075 |
1.4635 | 0.94 | 230 | 1.7117 |
1.3564 | 0.96 | 235 | 1.7041 |
1.5381 | 0.98 | 240 | 1.6902 |
1.2412 | 1.0 | 245 | 1.6838 |
1.7424 | 1.02 | 250 | 1.6803 |
1.2657 | 1.04 | 255 | 1.6869 |
1.2026 | 1.06 | 260 | 1.7028 |
0.9746 | 1.08 | 265 | 1.7092 |
1.3277 | 1.1 | 270 | 1.7065 |
1.741 | 1.12 | 275 | 1.7007 |
1.4553 | 1.14 | 280 | 1.6901 |
1.4277 | 1.16 | 285 | 1.6786 |
1.5373 | 1.18 | 290 | 1.6731 |
1.3754 | 1.2 | 295 | 1.6690 |
1.8448 | 1.22 | 300 | 1.6428 |
1.132 | 1.24 | 305 | 1.6277 |
1.1909 | 1.27 | 310 | 1.6236 |
1.2459 | 1.29 | 315 | 1.6253 |
1.1233 | 1.31 | 320 | 1.6310 |
1.1812 | 1.33 | 325 | 1.6327 |
1.2173 | 1.35 | 330 | 1.6318 |
1.1845 | 1.37 | 335 | 1.6331 |
1.4047 | 1.39 | 340 | 1.6198 |
1.3456 | 1.41 | 345 | 1.6102 |
1.0766 | 1.43 | 350 | 1.5972 |
1.434 | 1.45 | 355 | 1.5711 |
1.4121 | 1.47 | 360 | 1.5519 |
0.991 | 1.49 | 365 | 1.5307 |
1.1855 | 1.51 | 370 | 1.5250 |
0.9791 | 1.53 | 375 | 1.5176 |
1.1704 | 1.55 | 380 | 1.5166 |
0.8702 | 1.57 | 385 | 1.5181 |
1.1582 | 1.59 | 390 | 1.5084 |
1.0805 | 1.61 | 395 | 1.5046 |
1.3099 | 1.63 | 400 | 1.4955 |
1.2066 | 1.65 | 405 | 1.4818 |
1.0825 | 1.67 | 410 | 1.4846 |
1.0802 | 1.69 | 415 | 1.4849 |
1.7319 | 1.71 | 420 | 1.4855 |
1.5408 | 1.73 | 425 | 1.4909 |
0.5243 | 1.76 | 430 | 1.4993 |
1.0521 | 1.78 | 435 | 1.4943 |
1.0145 | 1.8 | 440 | 1.4867 |
1.0813 | 1.82 | 445 | 1.4760 |
1.1515 | 1.84 | 450 | 1.4462 |
0.9266 | 1.86 | 455 | 1.4358 |
0.6752 | 1.88 | 460 | 1.4328 |
1.1664 | 1.9 | 465 | 1.4342 |
1.1168 | 1.92 | 470 | 1.4390 |
1.3819 | 1.94 | 475 | 1.4468 |
0.9204 | 1.96 | 480 | 1.4451 |
0.8669 | 1.98 | 485 | 1.4357 |
1.0333 | 2.0 | 490 | 1.4236 |
1.0886 | 2.02 | 495 | 1.4128 |
1.1797 | 2.04 | 500 | 1.4085 |
1.0462 | 2.06 | 505 | 1.4091 |
1.009 | 2.08 | 510 | 1.4157 |
0.7713 | 2.1 | 515 | 1.4277 |
1.1869 | 2.12 | 520 | 1.4372 |
0.5705 | 2.14 | 525 | 1.4452 |
0.8965 | 2.16 | 530 | 1.4562 |
0.6888 | 2.18 | 535 | 1.4563 |
0.682 | 2.2 | 540 | 1.4599 |
0.8815 | 2.22 | 545 | 1.4600 |
0.9211 | 2.24 | 550 | 1.4659 |
0.8063 | 2.27 | 555 | 1.4663 |
0.6676 | 2.29 | 560 | 1.4635 |
1.0024 | 2.31 | 565 | 1.4577 |
0.9457 | 2.33 | 570 | 1.4536 |
1.0273 | 2.35 | 575 | 1.4480 |
0.5464 | 2.37 | 580 | 1.4496 |
0.7404 | 2.39 | 585 | 1.4582 |
0.7804 | 2.41 | 590 | 1.4659 |
0.9942 | 2.43 | 595 | 1.4701 |
0.9433 | 2.45 | 600 | 1.4730 |
0.8804 | 2.47 | 605 | 1.4688 |
0.7836 | 2.49 | 610 | 1.4657 |
0.7613 | 2.51 | 615 | 1.4588 |
0.8007 | 2.53 | 620 | 1.4565 |
0.7768 | 2.55 | 625 | 1.4501 |
0.9832 | 2.57 | 630 | 1.4430 |
0.7297 | 2.59 | 635 | 1.4410 |
0.8646 | 2.61 | 640 | 1.4440 |
1.1847 | 2.63 | 645 | 1.4449 |
0.7582 | 2.65 | 650 | 1.4397 |
1.024 | 2.67 | 655 | 1.4312 |
0.6909 | 2.69 | 660 | 1.4297 |
0.9462 | 2.71 | 665 | 1.4311 |
0.6868 | 2.73 | 670 | 1.4344 |
0.9798 | 2.76 | 675 | 1.4380 |
1.2549 | 2.78 | 680 | 1.4392 |
0.5431 | 2.8 | 685 | 1.4394 |
0.7168 | 2.82 | 690 | 1.4391 |
0.8719 | 2.84 | 695 | 1.4390 |
0.6935 | 2.86 | 700 | 1.4360 |
0.7472 | 2.88 | 705 | 1.4229 |
0.7485 | 2.9 | 710 | 1.4085 |
0.8291 | 2.92 | 715 | 1.3977 |
0.8684 | 2.94 | 720 | 1.3934 |
0.7158 | 2.96 | 725 | 1.3930 |
0.9039 | 2.98 | 730 | 1.3936 |
0.6393 | 3.0 | 735 | 1.3934 |
0.5457 | 3.02 | 740 | 1.3917 |
1.0716 | 3.04 | 745 | 1.3922 |
0.5797 | 3.06 | 750 | 1.3908 |
0.5073 | 3.08 | 755 | 1.3910 |
0.5619 | 3.1 | 760 | 1.3925 |
0.7002 | 3.12 | 765 | 1.3947 |
0.9512 | 3.14 | 770 | 1.3974 |
0.6535 | 3.16 | 775 | 1.3992 |
0.3872 | 3.18 | 780 | 1.4014 |
0.6217 | 3.2 | 785 | 1.4050 |
0.6864 | 3.22 | 790 | 1.4015 |
0.4067 | 3.24 | 795 | 1.3966 |
0.4893 | 3.27 | 800 | 1.3939 |
0.5004 | 3.29 | 805 | 1.3951 |
0.9775 | 3.31 | 810 | 1.3962 |
0.9014 | 3.33 | 815 | 1.3970 |
0.8747 | 3.35 | 820 | 1.3975 |
0.7479 | 3.37 | 825 | 1.3982 |
0.5784 | 3.39 | 830 | 1.3987 |
0.7599 | 3.41 | 835 | 1.4003 |
0.425 | 3.43 | 840 | 1.4019 |
0.5207 | 3.45 | 845 | 1.4032 |
0.8591 | 3.47 | 850 | 1.4040 |
0.5839 | 3.49 | 855 | 1.4042 |
0.7019 | 3.51 | 860 | 1.4045 |
0.4606 | 3.53 | 865 | 1.4044 |
0.8912 | 3.55 | 870 | 1.4044 |
0.6471 | 3.57 | 875 | 1.4047 |
0.5152 | 3.59 | 880 | 1.4050 |
0.4845 | 3.61 | 885 | 1.4039 |
0.6449 | 3.63 | 890 | 1.4031 |
0.7303 | 3.65 | 895 | 1.4025 |
0.4894 | 3.67 | 900 | 1.4022 |
0.6502 | 3.69 | 905 | 1.4021 |
0.8449 | 3.71 | 910 | 1.4020 |
0.7148 | 3.73 | 915 | 1.4019 |
0.7008 | 3.76 | 920 | 1.4019 |
0.5209 | 3.78 | 925 | 1.4018 |
1.022 | 3.8 | 930 | 1.4016 |
0.8529 | 3.82 | 935 | 1.4013 |
0.4514 | 3.84 | 940 | 1.4014 |
0.5137 | 3.86 | 945 | 1.4016 |
0.9131 | 3.88 | 950 | 1.4016 |
0.5213 | 3.9 | 955 | 1.4017 |
0.5542 | 3.92 | 960 | 1.4018 |
0.9475 | 3.94 | 965 | 1.4019 |
0.6425 | 3.96 | 970 | 1.4019 |
0.886 | 3.98 | 975 | 1.4019 |
0.7525 | 4.0 | 980 | 1.4019 |
0.4966 | 4.02 | 985 | 1.4019 |
0.6851 | 4.04 | 990 | 1.4019 |
0.7414 | 4.06 | 995 | 1.4019 |
0.4963 | 4.08 | 1000 | 1.4019 |
Framework versions
- Transformers 4.34.0
- Pytorch 2.1.0
- Datasets 2.14.5
- Tokenizers 0.14.1