<!-- This model card has been generated automatically according to the information Keras had access to. You should probably proofread and complete it, then remove this comment. -->
nlewins/mt5-small-finetuned-ceb-to-en-tfY
This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Train Loss: 0.4418
- Validation Loss: 4.3158
- Train Bleu: 10.3092
- Train Gen Len: 31.8148
- Epoch: 86
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 1e-04, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: float32
Training results
Train Loss | Validation Loss | Train Bleu | Train Gen Len | Epoch |
---|---|---|---|---|
12.0581 | 5.6843 | 0.0157 | 461.8741 | 0 |
7.0723 | 4.5602 | 0.0087 | 478.2796 | 1 |
6.1442 | 4.2266 | 0.0382 | 406.7833 | 2 |
5.6655 | 4.0618 | 0.0433 | 330.1426 | 3 |
5.3850 | 3.9865 | 0.0829 | 259.1389 | 4 |
5.1640 | 3.9261 | 0.1254 | 178.6611 | 5 |
5.0054 | 3.8756 | 0.1863 | 124.9 | 6 |
4.8650 | 3.8294 | 0.5620 | 68.1852 | 7 |
4.7407 | 3.7764 | 0.9249 | 52.9574 | 8 |
4.6149 | 3.7323 | 1.0588 | 47.9815 | 9 |
4.5139 | 3.6872 | 1.0838 | 59.25 | 10 |
4.4121 | 3.6415 | 1.1190 | 53.8981 | 11 |
4.3220 | 3.6061 | 1.1783 | 48.1741 | 12 |
4.2280 | 3.5820 | 1.4894 | 47.8870 | 13 |
4.1399 | 3.5527 | 1.2391 | 59.2963 | 14 |
4.0721 | 3.5225 | 1.6758 | 55.7574 | 15 |
3.9868 | 3.4965 | 1.4156 | 64.5870 | 16 |
3.8999 | 3.4638 | 1.6347 | 55.7222 | 17 |
3.8296 | 3.4457 | 1.8747 | 55.6185 | 18 |
3.7547 | 3.4115 | 2.1395 | 49.1444 | 19 |
3.7035 | 3.3861 | 2.5959 | 44.5444 | 20 |
3.6185 | 3.3578 | 2.9953 | 43.9352 | 21 |
3.5474 | 3.3363 | 2.4835 | 53.4037 | 22 |
3.4760 | 3.3171 | 2.6081 | 55.5907 | 23 |
3.4081 | 3.2954 | 2.8999 | 48.4259 | 24 |
3.3343 | 3.2739 | 2.7449 | 54.2074 | 25 |
3.2743 | 3.2497 | 2.6874 | 52.2241 | 26 |
3.2062 | 3.2348 | 3.6606 | 46.0426 | 27 |
3.1360 | 3.2253 | 4.0205 | 39.6167 | 28 |
3.0652 | 3.2088 | 4.1229 | 39.5741 | 29 |
2.9938 | 3.2031 | 4.3597 | 38.95 | 30 |
2.9290 | 3.1854 | 4.7612 | 38.2852 | 31 |
2.8550 | 3.1806 | 5.1633 | 35.2222 | 32 |
2.7970 | 3.1639 | 5.2615 | 37.3389 | 33 |
2.7319 | 3.1642 | 5.2744 | 34.3944 | 34 |
2.6722 | 3.1634 | 5.2671 | 34.3778 | 35 |
2.5974 | 3.1444 | 5.5658 | 38.3593 | 36 |
2.5347 | 3.1347 | 6.0430 | 36.4444 | 37 |
2.4666 | 3.1478 | 6.7825 | 32.0593 | 38 |
2.4096 | 3.1433 | 6.9632 | 34.4370 | 39 |
2.3358 | 3.1419 | 6.5168 | 34.6926 | 40 |
2.2753 | 3.1384 | 6.9347 | 34.9370 | 41 |
2.2167 | 3.1478 | 7.0051 | 33.9111 | 42 |
2.1553 | 3.1580 | 7.0209 | 35.6667 | 43 |
2.0914 | 3.1575 | 6.6705 | 36.9593 | 44 |
2.0381 | 3.1591 | 7.0970 | 36.9815 | 45 |
1.9741 | 3.1798 | 7.2865 | 35.6778 | 46 |
1.9104 | 3.1725 | 6.9376 | 38.1019 | 47 |
1.8605 | 3.1986 | 8.2566 | 32.6074 | 48 |
1.7946 | 3.2041 | 8.5780 | 33.3444 | 49 |
1.7388 | 3.2184 | 8.2985 | 34.5556 | 50 |
1.6951 | 3.2317 | 8.5468 | 35.1185 | 51 |
1.6300 | 3.2446 | 8.0576 | 39.0148 | 52 |
1.5703 | 3.2577 | 8.7040 | 35.2444 | 53 |
1.5088 | 3.2716 | 8.4979 | 35.2815 | 54 |
1.4573 | 3.2817 | 8.4574 | 33.2519 | 55 |
1.4126 | 3.3135 | 8.6438 | 34.3519 | 56 |
1.3521 | 3.3628 | 9.1394 | 33.3278 | 57 |
1.3051 | 3.3638 | 9.4435 | 33.55 | 58 |
1.2568 | 3.3919 | 8.8440 | 35.5370 | 59 |
1.2155 | 3.4077 | 9.4603 | 33.0870 | 60 |
1.1777 | 3.4634 | 9.6192 | 33.6185 | 61 |
1.1183 | 3.4597 | 8.9327 | 34.7481 | 62 |
1.0765 | 3.5057 | 9.7793 | 33.6963 | 63 |
1.0385 | 3.5243 | 9.0765 | 34.7259 | 64 |
1.0039 | 3.5569 | 8.9329 | 34.8 | 65 |
0.9598 | 3.6022 | 9.7938 | 32.0352 | 66 |
0.9215 | 3.5964 | 9.1933 | 34.4907 | 67 |
0.8970 | 3.6275 | 9.7480 | 33.6296 | 68 |
0.8580 | 3.6986 | 9.5177 | 33.9056 | 69 |
0.8152 | 3.7642 | 9.1937 | 33.9796 | 70 |
0.7925 | 3.7641 | 9.0237 | 35.0685 | 71 |
0.7601 | 3.7957 | 9.8801 | 32.8444 | 72 |
0.7309 | 3.8979 | 10.1862 | 32.6593 | 73 |
0.7058 | 3.8768 | 9.5758 | 34.2778 | 74 |
0.6824 | 3.8878 | 10.0571 | 33.3722 | 75 |
0.6447 | 3.9822 | 9.8213 | 33.7241 | 76 |
0.6232 | 3.9933 | 10.4677 | 32.2778 | 77 |
0.5977 | 4.0486 | 10.3455 | 32.3019 | 78 |
0.5764 | 4.0573 | 10.2460 | 33.7981 | 79 |
0.5572 | 4.1037 | 10.3378 | 32.4093 | 80 |
0.5318 | 4.1893 | 10.0543 | 32.2037 | 81 |
0.5159 | 4.1671 | 10.5916 | 32.6981 | 82 |
0.4961 | 4.2527 | 10.3419 | 32.0370 | 83 |
0.4785 | 4.2824 | 9.7580 | 30.9185 | 84 |
0.4543 | 4.3117 | 9.7117 | 33.1093 | 85 |
0.4418 | 4.3158 | 10.3092 | 31.8148 | 86 |
Framework versions
- Transformers 4.33.3
- TensorFlow 2.14.0
- Datasets 2.14.5
- Tokenizers 0.13.3