<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mt5-large-gramatika-e8-b16
This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4702
- Rouge1: 57.4844
- Rouge2: 42.2878
- Rougel: 56.6236
- Rougelsum: 56.6102
- Gen Len: 18.6501
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
1.6744 | 0.09 | 74 | 0.7722 | 57.1766 | 40.9296 | 55.7333 | 55.7459 | 18.6288 |
0.8848 | 0.18 | 148 | 0.7209 | 57.623 | 41.7198 | 56.3469 | 56.3608 | 18.6394 |
0.7966 | 0.26 | 222 | 0.6503 | 57.3516 | 41.3777 | 56.0526 | 56.0621 | 18.6448 |
0.7111 | 0.35 | 296 | 0.6027 | 56.996 | 40.8324 | 55.8609 | 55.8657 | 18.6353 |
0.7094 | 0.44 | 370 | 0.5883 | 57.3389 | 41.4181 | 56.1801 | 56.2039 | 18.6418 |
0.6629 | 0.53 | 444 | 0.5546 | 57.5353 | 41.8447 | 56.4625 | 56.4577 | 18.6530 |
0.6516 | 0.61 | 518 | 0.5437 | 56.692 | 40.9665 | 55.6369 | 55.6328 | 18.6519 |
0.634 | 0.7 | 592 | 0.5502 | 57.7306 | 42.2573 | 56.6605 | 56.6682 | 18.6400 |
0.5905 | 0.79 | 666 | 0.5185 | 56.8678 | 41.1048 | 55.8438 | 55.843 | 18.6619 |
0.5795 | 0.88 | 740 | 0.5056 | 56.4846 | 40.837 | 55.5979 | 55.599 | 18.6850 |
0.5764 | 0.96 | 814 | 0.5055 | 57.2228 | 41.8396 | 56.2943 | 56.3095 | 18.6590 |
0.4861 | 1.05 | 888 | 0.5254 | 57.7239 | 42.2379 | 56.7763 | 56.778 | 18.6353 |
0.4501 | 1.14 | 962 | 0.5107 | 57.019 | 41.6304 | 56.1098 | 56.0972 | 18.6649 |
0.4317 | 1.23 | 1036 | 0.5123 | 57.0608 | 41.6997 | 56.1276 | 56.1257 | 18.6838 |
0.4371 | 1.31 | 1110 | 0.4970 | 56.9459 | 41.5514 | 56.0402 | 56.0285 | 18.6862 |
0.4581 | 1.4 | 1184 | 0.4952 | 57.2105 | 41.9667 | 56.3955 | 56.369 | 18.6596 |
0.4384 | 1.49 | 1258 | 0.4880 | 57.0156 | 41.6722 | 56.142 | 56.1309 | 18.6779 |
0.4364 | 1.58 | 1332 | 0.4881 | 57.0195 | 41.7102 | 56.1887 | 56.1846 | 18.6637 |
0.4443 | 1.66 | 1406 | 0.4909 | 57.6782 | 42.5114 | 56.7489 | 56.7244 | 18.6507 |
0.4364 | 1.75 | 1480 | 0.4834 | 56.8872 | 41.3136 | 56.0289 | 56.0154 | 18.6708 |
0.4295 | 1.84 | 1554 | 0.4718 | 57.1826 | 41.9052 | 56.3636 | 56.359 | 18.6542 |
0.4314 | 1.93 | 1628 | 0.4702 | 57.4844 | 42.2878 | 56.6236 | 56.6102 | 18.6501 |
0.4046 | 2.01 | 1702 | 0.4935 | 56.0299 | 40.2552 | 55.2098 | 55.1925 | 18.7063 |
0.3138 | 2.1 | 1776 | 0.4877 | 56.7632 | 41.2407 | 55.9588 | 55.9343 | 18.6755 |
0.319 | 2.19 | 1850 | 0.4985 | 57.1145 | 41.8766 | 56.3223 | 56.3261 | 18.6548 |
0.3145 | 2.28 | 1924 | 0.4941 | 56.8582 | 41.7294 | 56.1526 | 56.1647 | 18.6542 |
0.3206 | 2.36 | 1998 | 0.4988 | 57.304 | 42.1623 | 56.4643 | 56.4487 | 18.6519 |
0.3177 | 2.45 | 2072 | 0.4942 | 57.1824 | 41.9585 | 56.3879 | 56.3715 | 18.6554 |
0.3228 | 2.54 | 2146 | 0.4990 | 57.4287 | 42.2693 | 56.5335 | 56.5023 | 18.6471 |
0.3222 | 2.63 | 2220 | 0.4876 | 56.7484 | 41.1785 | 55.9649 | 55.9475 | 18.6625 |
0.3223 | 2.71 | 2294 | 0.4792 | 57.6631 | 42.7192 | 56.8764 | 56.8609 | 18.6554 |
0.3182 | 2.8 | 2368 | 0.4811 | 57.1701 | 42.0007 | 56.2943 | 56.2798 | 18.6667 |
0.3255 | 2.89 | 2442 | 0.4743 | 57.3947 | 42.1906 | 56.5159 | 56.5183 | 18.6708 |
0.3175 | 2.98 | 2516 | 0.4761 | 57.7291 | 42.7122 | 56.8483 | 56.8481 | 18.6750 |
0.2362 | 3.07 | 2590 | 0.5144 | 57.4557 | 42.4091 | 56.6761 | 56.6874 | 18.6738 |
0.2219 | 3.15 | 2664 | 0.5121 | 57.1841 | 41.9918 | 56.4033 | 56.4079 | 18.6625 |
0.2246 | 3.24 | 2738 | 0.5132 | 57.0261 | 41.9832 | 56.2535 | 56.2668 | 18.6773 |
0.2329 | 3.33 | 2812 | 0.5007 | 57.0272 | 41.8983 | 56.2765 | 56.279 | 18.6732 |
0.2351 | 3.42 | 2886 | 0.5107 | 57.3793 | 42.3892 | 56.5826 | 56.5745 | 18.6607 |
0.2336 | 3.5 | 2960 | 0.5084 | 56.8408 | 41.5052 | 56.0096 | 56.0016 | 18.6720 |
0.2434 | 3.59 | 3034 | 0.4979 | 57.3899 | 42.5822 | 56.5939 | 56.5735 | 18.6732 |
0.2354 | 3.68 | 3108 | 0.5052 | 57.4978 | 42.4764 | 56.6433 | 56.6289 | 18.6625 |
0.2392 | 3.77 | 3182 | 0.5075 | 57.1829 | 41.9163 | 56.4032 | 56.3883 | 18.6673 |
0.2386 | 3.85 | 3256 | 0.5200 | 57.6684 | 42.7638 | 56.8977 | 56.8874 | 18.6560 |
0.2351 | 3.94 | 3330 | 0.5004 | 56.8835 | 41.6092 | 56.1045 | 56.0997 | 18.6827 |
0.2097 | 4.03 | 3404 | 0.5421 | 56.7649 | 41.5071 | 55.997 | 55.9966 | 18.6909 |
0.1454 | 4.12 | 3478 | 0.5636 | 57.2572 | 42.1992 | 56.4827 | 56.4719 | 18.6702 |
0.1563 | 4.2 | 3552 | 0.5745 | 57.1321 | 42.1765 | 56.3809 | 56.3526 | 18.6696 |
0.1551 | 4.29 | 3626 | 0.5637 | 57.1671 | 42.1927 | 56.413 | 56.4115 | 18.6673 |
0.1459 | 4.38 | 3700 | 0.5872 | 56.7745 | 41.4087 | 55.9735 | 55.9921 | 18.6815 |
0.1557 | 4.47 | 3774 | 0.5628 | 57.1925 | 42.08 | 56.4231 | 56.4089 | 18.6838 |
0.1578 | 4.55 | 3848 | 0.5632 | 57.2261 | 42.2509 | 56.5044 | 56.5064 | 18.6714 |
0.1606 | 4.64 | 3922 | 0.5681 | 57.1246 | 42.0846 | 56.3962 | 56.3833 | 18.6726 |
0.1583 | 4.73 | 3996 | 0.5584 | 57.293 | 42.2457 | 56.5662 | 56.5524 | 18.6767 |
0.1608 | 4.82 | 4070 | 0.5618 | 56.8716 | 41.5624 | 56.1165 | 56.127 | 18.6874 |
0.1571 | 4.9 | 4144 | 0.5574 | 57.0316 | 41.8376 | 56.2618 | 56.2535 | 18.6690 |
0.1607 | 4.99 | 4218 | 0.5511 | 56.8914 | 41.6437 | 56.1124 | 56.1272 | 18.6773 |
0.1038 | 5.08 | 4292 | 0.6172 | 56.9648 | 41.7283 | 56.2261 | 56.2454 | 18.6696 |
0.0897 | 5.17 | 4366 | 0.6341 | 56.4782 | 41.2787 | 55.7474 | 55.7411 | 18.6838 |
0.0888 | 5.25 | 4440 | 0.6572 | 56.5301 | 41.2728 | 55.8126 | 55.7942 | 18.6732 |
0.0937 | 5.34 | 4514 | 0.6384 | 56.6779 | 41.4802 | 55.9566 | 55.9504 | 18.6803 |
0.0957 | 5.43 | 4588 | 0.6345 | 56.7333 | 41.4541 | 56.001 | 56.0057 | 18.6720 |
0.0963 | 5.52 | 4662 | 0.6396 | 56.9616 | 41.8147 | 56.2515 | 56.2669 | 18.6720 |
0.0978 | 5.6 | 4736 | 0.6445 | 56.9598 | 41.8062 | 56.2173 | 56.2321 | 18.6750 |
0.0977 | 5.69 | 4810 | 0.6406 | 57.127 | 41.8928 | 56.3579 | 56.3805 | 18.6702 |
0.0914 | 5.78 | 4884 | 0.6447 | 57.1084 | 41.9397 | 56.3435 | 56.3571 | 18.6690 |
0.0953 | 5.87 | 4958 | 0.6237 | 56.96 | 41.6289 | 56.1983 | 56.1835 | 18.6750 |
0.1011 | 5.96 | 5032 | 0.6270 | 56.6038 | 41.3111 | 55.8926 | 55.9025 | 18.6779 |
0.0755 | 6.04 | 5106 | 0.6856 | 57.0741 | 41.8985 | 56.336 | 56.3234 | 18.6690 |
0.0535 | 6.13 | 5180 | 0.7134 | 56.7823 | 41.5468 | 56.095 | 56.0678 | 18.6726 |
0.052 | 6.22 | 5254 | 0.7340 | 57.0192 | 41.8131 | 56.282 | 56.2856 | 18.6720 |
0.056 | 6.31 | 5328 | 0.7185 | 56.4281 | 41.0604 | 55.7292 | 55.7132 | 18.6821 |
0.053 | 6.39 | 5402 | 0.7265 | 56.7439 | 41.5055 | 56.0488 | 56.0426 | 18.6744 |
0.0521 | 6.48 | 5476 | 0.7360 | 57.0554 | 41.904 | 56.3436 | 56.3367 | 18.6649 |
0.0571 | 6.57 | 5550 | 0.7267 | 56.9341 | 41.9058 | 56.224 | 56.223 | 18.6714 |
0.054 | 6.66 | 5624 | 0.7267 | 56.9627 | 41.8572 | 56.2197 | 56.2187 | 18.6732 |
0.0556 | 6.74 | 5698 | 0.7209 | 56.7077 | 41.3216 | 55.9471 | 55.947 | 18.6726 |
0.053 | 6.83 | 5772 | 0.7298 | 56.7278 | 41.5857 | 55.984 | 55.9966 | 18.6797 |
0.0599 | 6.92 | 5846 | 0.6995 | 56.7807 | 41.6911 | 56.0494 | 56.0496 | 18.6767 |
0.0522 | 7.01 | 5920 | 0.7314 | 56.933 | 41.7532 | 56.1871 | 56.1793 | 18.6714 |
0.0336 | 7.09 | 5994 | 0.7741 | 56.6952 | 41.5009 | 55.9396 | 55.9121 | 18.6714 |
0.0349 | 7.18 | 6068 | 0.7876 | 56.7808 | 41.5416 | 56.019 | 56.0063 | 18.6643 |
0.0328 | 7.27 | 6142 | 0.7963 | 56.9808 | 41.8677 | 56.2244 | 56.2149 | 18.6726 |
0.0322 | 7.36 | 6216 | 0.8133 | 56.8728 | 41.7155 | 56.1426 | 56.1346 | 18.6679 |
0.0314 | 7.44 | 6290 | 0.8082 | 56.9211 | 41.7997 | 56.1838 | 56.1862 | 18.6649 |
0.0339 | 7.53 | 6364 | 0.8117 | 56.9648 | 41.8697 | 56.2275 | 56.2148 | 18.6690 |
0.0334 | 7.62 | 6438 | 0.8041 | 56.8181 | 41.6356 | 56.0903 | 56.0829 | 18.6673 |
0.033 | 7.71 | 6512 | 0.8082 | 57.0346 | 41.9075 | 56.295 | 56.2825 | 18.6673 |
0.0322 | 7.79 | 6586 | 0.8125 | 56.9522 | 41.8129 | 56.2334 | 56.2186 | 18.6684 |
0.0332 | 7.88 | 6660 | 0.8158 | 56.9637 | 41.8652 | 56.2385 | 56.2278 | 18.6673 |
0.0337 | 7.97 | 6734 | 0.8152 | 56.9415 | 41.836 | 56.2107 | 56.2054 | 18.6684 |
Framework versions
- Transformers 4.30.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.3