<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mt5-base-gramatika-e8-b16
This model is a fine-tuned version of google/mt5-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4843
- Rouge1: 56.8774
- Rouge2: 41.5601
- Rougel: 55.9632
- Rougelsum: 56.0201
- Gen Len: 18.6838
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
3.2706 | 0.09 | 74 | 1.9783 | 8.1545 | 4.7719 | 7.9341 | 7.9557 | 3.8745 |
1.4235 | 0.18 | 148 | 0.8458 | 57.549 | 41.5014 | 56.1232 | 56.1673 | 18.6234 |
1.653 | 0.26 | 222 | 0.7267 | 57.1709 | 41.199 | 55.7837 | 55.8261 | 18.6436 |
0.8971 | 0.35 | 296 | 0.6929 | 57.3406 | 41.3288 | 56.0036 | 56.0401 | 18.6365 |
0.8493 | 0.44 | 370 | 0.7028 | 57.3597 | 41.4383 | 56.1212 | 56.1607 | 18.6128 |
0.8056 | 0.53 | 444 | 0.6244 | 57.3035 | 41.4437 | 56.0385 | 56.0736 | 18.6454 |
0.7628 | 0.61 | 518 | 0.6060 | 57.0247 | 41.1985 | 55.9014 | 55.9403 | 18.6400 |
0.7365 | 0.7 | 592 | 0.5934 | 57.5005 | 41.8768 | 56.3047 | 56.3603 | 18.6359 |
0.6832 | 0.79 | 666 | 0.5856 | 57.1086 | 41.4354 | 56.0046 | 56.0574 | 18.6359 |
0.6747 | 0.88 | 740 | 0.5586 | 57.0216 | 41.3228 | 55.9335 | 55.9692 | 18.6513 |
0.6639 | 0.96 | 814 | 0.5414 | 57.2684 | 41.9646 | 56.2654 | 56.3129 | 18.6519 |
0.5861 | 1.05 | 888 | 0.5601 | 57.7025 | 42.377 | 56.6562 | 56.6946 | 18.6258 |
0.5567 | 1.14 | 962 | 0.5408 | 57.1895 | 41.8042 | 56.1553 | 56.1894 | 18.6548 |
0.5228 | 1.23 | 1036 | 0.5478 | 57.0075 | 41.3149 | 55.9472 | 55.9622 | 18.6590 |
0.5352 | 1.31 | 1110 | 0.5303 | 57.4388 | 42.1268 | 56.3561 | 56.4077 | 18.6548 |
0.5501 | 1.4 | 1184 | 0.5262 | 57.2525 | 41.8999 | 56.2048 | 56.2467 | 18.6625 |
0.5319 | 1.49 | 1258 | 0.5199 | 57.4317 | 41.9039 | 56.3713 | 56.4254 | 18.6631 |
0.5146 | 1.58 | 1332 | 0.5089 | 57.5201 | 42.4081 | 56.5436 | 56.5939 | 18.6637 |
0.5324 | 1.66 | 1406 | 0.5160 | 57.7766 | 42.606 | 56.7277 | 56.7772 | 18.6436 |
0.5136 | 1.75 | 1480 | 0.5085 | 57.2779 | 41.8313 | 56.2847 | 56.3359 | 18.6602 |
0.4972 | 1.84 | 1554 | 0.4995 | 57.5075 | 42.225 | 56.5704 | 56.6215 | 18.6619 |
0.5015 | 1.93 | 1628 | 0.4901 | 56.73 | 41.1767 | 55.8184 | 55.8498 | 18.6803 |
0.4829 | 2.01 | 1702 | 0.5008 | 56.9981 | 41.5694 | 56.0918 | 56.1334 | 18.6827 |
0.4052 | 2.1 | 1776 | 0.5017 | 57.4272 | 42.2729 | 56.4935 | 56.5506 | 18.6714 |
0.4112 | 2.19 | 1850 | 0.5097 | 57.6196 | 42.5076 | 56.6613 | 56.7146 | 18.6525 |
0.3991 | 2.28 | 1924 | 0.5087 | 57.6559 | 42.7547 | 56.7696 | 56.8281 | 18.6471 |
0.4031 | 2.36 | 1998 | 0.5084 | 57.4165 | 42.4054 | 56.5009 | 56.558 | 18.6625 |
0.4007 | 2.45 | 2072 | 0.5085 | 57.3745 | 42.346 | 56.4867 | 56.5289 | 18.6613 |
0.4137 | 2.54 | 2146 | 0.5036 | 57.3508 | 42.3167 | 56.4024 | 56.4596 | 18.6655 |
0.4052 | 2.63 | 2220 | 0.4942 | 57.0004 | 41.7664 | 56.1164 | 56.1538 | 18.6625 |
0.3992 | 2.71 | 2294 | 0.4950 | 57.8077 | 42.9259 | 56.9039 | 56.9464 | 18.6566 |
0.3995 | 2.8 | 2368 | 0.4843 | 56.8774 | 41.5601 | 55.9632 | 56.0201 | 18.6838 |
0.4084 | 2.89 | 2442 | 0.4850 | 57.7562 | 42.8179 | 56.7726 | 56.8443 | 18.6584 |
0.3989 | 2.98 | 2516 | 0.4918 | 57.7656 | 42.6957 | 56.8405 | 56.8911 | 18.6655 |
0.3308 | 3.07 | 2590 | 0.5070 | 57.9183 | 43.0255 | 57.0692 | 57.1078 | 18.6679 |
0.3102 | 3.15 | 2664 | 0.5067 | 57.3651 | 42.1706 | 56.4929 | 56.5592 | 18.6560 |
0.3144 | 3.24 | 2738 | 0.5112 | 57.6685 | 42.7202 | 56.7537 | 56.8009 | 18.6655 |
0.3282 | 3.33 | 2812 | 0.5041 | 57.6229 | 42.6598 | 56.7712 | 56.8143 | 18.6519 |
0.3241 | 3.42 | 2886 | 0.5152 | 57.7354 | 42.989 | 56.8477 | 56.8989 | 18.6519 |
0.3249 | 3.5 | 2960 | 0.4991 | 57.2176 | 42.19 | 56.3246 | 56.368 | 18.6637 |
0.3414 | 3.59 | 3034 | 0.4987 | 57.6918 | 42.933 | 56.7931 | 56.843 | 18.6702 |
0.3294 | 3.68 | 3108 | 0.4913 | 57.6053 | 42.6931 | 56.7697 | 56.8108 | 18.6572 |
0.3223 | 3.77 | 3182 | 0.4952 | 57.567 | 42.6625 | 56.7544 | 56.8039 | 18.6596 |
0.3286 | 3.85 | 3256 | 0.5110 | 58.0715 | 43.3152 | 57.1817 | 57.2226 | 18.6584 |
0.3164 | 3.94 | 3330 | 0.4927 | 56.914 | 41.9441 | 56.1047 | 56.1444 | 18.6773 |
0.2963 | 4.03 | 3404 | 0.5147 | 57.4953 | 42.5211 | 56.6461 | 56.6736 | 18.6761 |
0.2555 | 4.12 | 3478 | 0.5326 | 57.7026 | 42.948 | 56.8695 | 56.9015 | 18.6596 |
0.2614 | 4.2 | 3552 | 0.5266 | 57.5255 | 42.7141 | 56.7282 | 56.7762 | 18.6625 |
0.2612 | 4.29 | 3626 | 0.5160 | 57.6446 | 42.8553 | 56.8162 | 56.8481 | 18.6542 |
0.253 | 4.38 | 3700 | 0.5271 | 57.3989 | 42.5453 | 56.598 | 56.6297 | 18.6637 |
0.2606 | 4.47 | 3774 | 0.5148 | 57.6538 | 42.9936 | 56.8624 | 56.8966 | 18.6655 |
0.2611 | 4.55 | 3848 | 0.5219 | 57.9243 | 43.2273 | 57.064 | 57.1159 | 18.6560 |
0.269 | 4.64 | 3922 | 0.5158 | 57.567 | 42.5436 | 56.7299 | 56.7744 | 18.6613 |
0.2507 | 4.73 | 3996 | 0.5230 | 57.7206 | 42.8407 | 56.9271 | 56.9744 | 18.6554 |
0.2671 | 4.82 | 4070 | 0.5129 | 57.3382 | 42.1359 | 56.501 | 56.5339 | 18.6773 |
0.2529 | 4.9 | 4144 | 0.5169 | 57.8124 | 42.8965 | 56.8949 | 56.9491 | 18.6596 |
0.2637 | 4.99 | 4218 | 0.5027 | 57.4363 | 42.6048 | 56.6022 | 56.6141 | 18.6726 |
0.2092 | 5.08 | 4292 | 0.5459 | 57.7903 | 43.2307 | 56.967 | 57.0261 | 18.6625 |
0.2027 | 5.17 | 4366 | 0.5530 | 57.5216 | 42.8174 | 56.7146 | 56.7549 | 18.6607 |
0.1998 | 5.25 | 4440 | 0.5433 | 56.8387 | 42.0324 | 56.1015 | 56.1458 | 18.6726 |
0.2031 | 5.34 | 4514 | 0.5534 | 57.2929 | 42.4948 | 56.4743 | 56.5265 | 18.6726 |
0.2013 | 5.43 | 4588 | 0.5484 | 57.1654 | 42.2389 | 56.3586 | 56.4095 | 18.6708 |
0.213 | 5.52 | 4662 | 0.5464 | 57.3162 | 42.4566 | 56.5355 | 56.5699 | 18.6667 |
0.2093 | 5.6 | 4736 | 0.5524 | 57.4916 | 42.6525 | 56.6843 | 56.7463 | 18.6643 |
0.2076 | 5.69 | 4810 | 0.5518 | 57.3392 | 42.4623 | 56.5264 | 56.58 | 18.6643 |
0.2104 | 5.78 | 4884 | 0.5487 | 57.7037 | 42.9998 | 56.9151 | 56.9486 | 18.6637 |
0.2057 | 5.87 | 4958 | 0.5432 | 57.7157 | 42.9545 | 56.8942 | 56.9519 | 18.6578 |
0.2212 | 5.96 | 5032 | 0.5499 | 57.6156 | 42.803 | 56.7574 | 56.7996 | 18.6625 |
0.1806 | 6.04 | 5106 | 0.5806 | 57.705 | 42.8983 | 56.8444 | 56.8889 | 18.6607 |
0.1627 | 6.13 | 5180 | 0.5894 | 57.3913 | 42.4738 | 56.5652 | 56.6214 | 18.6702 |
0.1623 | 6.22 | 5254 | 0.5906 | 57.4774 | 42.5896 | 56.698 | 56.739 | 18.6673 |
0.1619 | 6.31 | 5328 | 0.5937 | 57.5559 | 42.7721 | 56.7468 | 56.8002 | 18.6679 |
0.1601 | 6.39 | 5402 | 0.5982 | 57.4012 | 42.5106 | 56.5929 | 56.6381 | 18.6732 |
0.1645 | 6.48 | 5476 | 0.5896 | 57.354 | 42.5206 | 56.5882 | 56.6301 | 18.6690 |
0.1687 | 6.57 | 5550 | 0.5897 | 57.3682 | 42.3579 | 56.5691 | 56.6037 | 18.6714 |
0.1588 | 6.66 | 5624 | 0.5803 | 57.1074 | 42.1755 | 56.3318 | 56.3665 | 18.6732 |
0.1642 | 6.74 | 5698 | 0.5849 | 57.0016 | 42.0924 | 56.2506 | 56.2873 | 18.6684 |
0.1652 | 6.83 | 5772 | 0.5905 | 57.3946 | 42.579 | 56.5995 | 56.6425 | 18.6732 |
0.1661 | 6.92 | 5846 | 0.5898 | 57.3087 | 42.5098 | 56.5185 | 56.563 | 18.6708 |
0.165 | 7.01 | 5920 | 0.5875 | 57.5248 | 42.8267 | 56.724 | 56.7706 | 18.6738 |
0.1346 | 7.09 | 5994 | 0.6126 | 57.1604 | 42.2782 | 56.3798 | 56.4268 | 18.6832 |
0.1323 | 7.18 | 6068 | 0.6341 | 57.3062 | 42.3721 | 56.4977 | 56.5376 | 18.6767 |
0.1323 | 7.27 | 6142 | 0.6387 | 57.3784 | 42.4355 | 56.5936 | 56.6372 | 18.6732 |
0.1369 | 7.36 | 6216 | 0.6303 | 57.3292 | 42.3645 | 56.5516 | 56.5843 | 18.6767 |
0.1271 | 7.44 | 6290 | 0.6371 | 57.3898 | 42.4616 | 56.5843 | 56.626 | 18.6761 |
0.1377 | 7.53 | 6364 | 0.6364 | 57.4119 | 42.5403 | 56.6075 | 56.6496 | 18.6720 |
0.1383 | 7.62 | 6438 | 0.6305 | 57.2326 | 42.2237 | 56.445 | 56.4823 | 18.6732 |
0.137 | 7.71 | 6512 | 0.6278 | 57.4626 | 42.6463 | 56.6781 | 56.7258 | 18.6696 |
0.1257 | 7.79 | 6586 | 0.6298 | 57.4051 | 42.53 | 56.6253 | 56.6753 | 18.6679 |
0.1363 | 7.88 | 6660 | 0.6300 | 57.3342 | 42.4547 | 56.5678 | 56.6153 | 18.6690 |
0.1356 | 7.97 | 6734 | 0.6296 | 57.3567 | 42.4967 | 56.6014 | 56.6478 | 18.6684 |
Framework versions
- Transformers 4.30.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.3