<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mt5-small-finetuned-19jan-7
This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.6123
- Rouge1: 6.8298
- Rouge2: 0.1667
- Rougel: 6.5947
- Rougelsum: 6.6685
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 12
- eval_batch_size: 12
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 60
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
---|---|---|---|---|---|---|---|
16.2953 | 1.0 | 50 | 5.4420 | 2.3065 | 0.0 | 2.3217 | 2.3089 |
10.6895 | 2.0 | 100 | 4.4691 | 3.2975 | 0.3693 | 3.2976 | 3.3376 |
7.0377 | 3.0 | 150 | 3.2638 | 4.1896 | 0.3485 | 4.1487 | 4.1878 |
5.7221 | 4.0 | 200 | 3.0772 | 6.2012 | 0.7955 | 6.1846 | 6.3083 |
4.9356 | 5.0 | 250 | 3.0312 | 5.2032 | 0.8545 | 5.1829 | 5.2263 |
4.4656 | 6.0 | 300 | 3.0022 | 5.6901 | 1.3505 | 5.6184 | 5.6791 |
4.2279 | 7.0 | 350 | 2.9585 | 5.6907 | 1.5424 | 5.644 | 5.7768 |
4.0578 | 8.0 | 400 | 2.9098 | 5.7425 | 1.0202 | 5.6452 | 5.7881 |
3.9236 | 9.0 | 450 | 2.8686 | 6.2001 | 1.1793 | 6.1891 | 6.2508 |
3.8237 | 10.0 | 500 | 2.8222 | 5.9182 | 1.1793 | 5.8436 | 5.9807 |
3.7078 | 11.0 | 550 | 2.7890 | 5.4733 | 1.3896 | 5.3702 | 5.4957 |
3.641 | 12.0 | 600 | 2.7522 | 5.8312 | 1.1793 | 5.784 | 5.9037 |
3.5527 | 13.0 | 650 | 2.7168 | 6.3129 | 1.1793 | 6.2924 | 6.384 |
3.5281 | 14.0 | 700 | 2.7000 | 9.1787 | 0.8333 | 9.1491 | 9.2241 |
3.4547 | 15.0 | 750 | 2.6966 | 7.8778 | 0.3333 | 7.8306 | 7.9167 |
3.4386 | 16.0 | 800 | 2.6892 | 8.3907 | 0.3333 | 8.3167 | 8.4 |
3.3749 | 17.0 | 850 | 2.6786 | 8.6167 | 0.4167 | 8.5917 | 8.5787 |
3.3681 | 18.0 | 900 | 2.6895 | 8.2466 | 0.4167 | 8.1799 | 8.2407 |
3.3173 | 19.0 | 950 | 2.6957 | 8.1742 | 0.4167 | 8.1197 | 8.1429 |
3.3034 | 20.0 | 1000 | 2.6721 | 8.2466 | 0.4167 | 8.1799 | 8.2407 |
3.2594 | 21.0 | 1050 | 2.6698 | 8.569 | 0.4167 | 8.5419 | 8.619 |
3.2138 | 22.0 | 1100 | 2.6676 | 8.2722 | 0.4167 | 8.2343 | 8.3037 |
3.2239 | 23.0 | 1150 | 2.6537 | 8.1444 | 0.4167 | 8.1051 | 8.1301 |
3.1887 | 24.0 | 1200 | 2.6529 | 8.1444 | 0.4167 | 8.1051 | 8.1301 |
3.1641 | 25.0 | 1250 | 2.6685 | 7.7777 | 0.1667 | 7.7204 | 7.8143 |
3.162 | 26.0 | 1300 | 2.6619 | 8.3776 | 0.3333 | 8.4135 | 8.4692 |
3.1114 | 27.0 | 1350 | 2.6632 | 8.3776 | 0.3333 | 8.4135 | 8.4692 |
3.0645 | 28.0 | 1400 | 2.6438 | 7.8811 | 0.3333 | 7.8333 | 7.9484 |
3.0984 | 29.0 | 1450 | 2.6384 | 7.3936 | 0.1667 | 7.3609 | 7.4051 |
3.0712 | 30.0 | 1500 | 2.6389 | 6.9609 | 0.1667 | 6.875 | 7.0253 |
3.0662 | 31.0 | 1550 | 2.6346 | 7.95 | 0.1667 | 7.9051 | 8.0218 |
3.0294 | 32.0 | 1600 | 2.6420 | 7.3936 | 0.1667 | 7.3609 | 7.4051 |
3.0143 | 33.0 | 1650 | 2.6325 | 7.6526 | 0.1667 | 7.6869 | 7.7551 |
3.002 | 34.0 | 1700 | 2.6384 | 7.9436 | 0.1667 | 7.9317 | 8.016 |
2.9964 | 35.0 | 1750 | 2.6262 | 8.2958 | 0.4167 | 8.2317 | 8.3936 |
2.9893 | 36.0 | 1800 | 2.6351 | 8.6535 | 0.1667 | 8.616 | 8.7333 |
2.9862 | 37.0 | 1850 | 2.6320 | 8.2452 | 0.1667 | 8.2 | 8.3218 |
2.9588 | 38.0 | 1900 | 2.6214 | 7.6656 | 0.1667 | 7.6819 | 7.7 |
2.9697 | 39.0 | 1950 | 2.6229 | 7.1452 | 0.1667 | 7.1051 | 7.1942 |
2.9433 | 40.0 | 2000 | 2.6209 | 7.5775 | 0.4167 | 7.4893 | 7.5833 |
2.9306 | 41.0 | 2050 | 2.6197 | 7.525 | 0.4167 | 7.4435 | 7.5351 |
2.9382 | 42.0 | 2100 | 2.6190 | 7.525 | 0.4167 | 7.4435 | 7.5351 |
2.9269 | 43.0 | 2150 | 2.6234 | 7.3614 | 0.4167 | 7.2092 | 7.3592 |
2.9152 | 44.0 | 2200 | 2.6237 | 6.9976 | 0.1667 | 6.8777 | 7.0333 |
2.9137 | 45.0 | 2250 | 2.6213 | 6.9976 | 0.1667 | 6.8777 | 7.0333 |
2.9011 | 46.0 | 2300 | 2.6212 | 6.9976 | 0.1667 | 6.8777 | 7.0333 |
2.8941 | 47.0 | 2350 | 2.6188 | 6.7768 | 0.1667 | 6.6509 | 6.812 |
2.9143 | 48.0 | 2400 | 2.6126 | 7.0875 | 0.1667 | 6.803 | 6.9337 |
2.8798 | 49.0 | 2450 | 2.6207 | 6.4458 | 0.1667 | 6.3221 | 6.4527 |
2.8701 | 50.0 | 2500 | 2.6172 | 6.7542 | 0.1667 | 6.4857 | 6.5729 |
2.8823 | 51.0 | 2550 | 2.6161 | 6.9971 | 0.1667 | 6.6819 | 6.7968 |
2.8724 | 52.0 | 2600 | 2.6171 | 6.8298 | 0.1667 | 6.5947 | 6.6685 |
2.8635 | 53.0 | 2650 | 2.6176 | 6.8298 | 0.1667 | 6.5947 | 6.6685 |
2.8803 | 54.0 | 2700 | 2.6134 | 6.1417 | 0.1667 | 5.929 | 6.0423 |
2.8608 | 55.0 | 2750 | 2.6118 | 6.4953 | 0.1667 | 6.2113 | 6.3554 |
2.8655 | 56.0 | 2800 | 2.6125 | 6.4976 | 0.1667 | 6.2625 | 6.3539 |
2.856 | 57.0 | 2850 | 2.6136 | 6.8298 | 0.1667 | 6.5947 | 6.6685 |
2.8837 | 58.0 | 2900 | 2.6124 | 6.8298 | 0.1667 | 6.5947 | 6.6685 |
2.8871 | 59.0 | 2950 | 2.6123 | 6.8298 | 0.1667 | 6.5947 | 6.6685 |
2.8537 | 60.0 | 3000 | 2.6123 | 6.8298 | 0.1667 | 6.5947 | 6.6685 |
Framework versions
- Transformers 4.25.1
- Pytorch 1.13.1+cu116
- Datasets 2.8.0
- Tokenizers 0.13.2