<!-- This model card has been generated automatically according to the information Keras had access to. You should probably proofread and complete it, then remove this comment. -->
nlewins/mt5-small-finetuned-ceb-to-en-tfD
This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Train Loss: 2.0379
- Validation Loss: 3.0766
- Train Bleu: 8.3032
- Train Gen Len: 33.95
- Epoch: 64
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'module': 'keras.optimizers.schedules', 'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.0001, 'decay_steps': 10000, 'decay_rate': 0.4, 'staircase': False, 'name': None}, 'registered_name': None}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.0001}
- training_precision: float32
Training results
Train Loss | Validation Loss | Train Bleu | Train Gen Len | Epoch |
---|---|---|---|---|
10.2106 | 5.0508 | 0.0137 | 446.0648 | 0 |
6.7509 | 4.4073 | 0.0139 | 429.4630 | 1 |
6.0227 | 4.0907 | 0.0365 | 350.5889 | 2 |
5.6141 | 3.9654 | 0.1205 | 188.5870 | 3 |
5.3489 | 3.9050 | 0.3894 | 87.8630 | 4 |
5.1413 | 3.8607 | 0.5328 | 58.3167 | 5 |
4.9827 | 3.8213 | 1.1469 | 34.4963 | 6 |
4.8325 | 3.7860 | 1.0320 | 36.2833 | 7 |
4.7153 | 3.7475 | 1.2818 | 42.4463 | 8 |
4.6216 | 3.7136 | 1.3223 | 44.1611 | 9 |
4.5247 | 3.6768 | 0.9889 | 55.2259 | 10 |
4.4525 | 3.6533 | 1.3691 | 49.3574 | 11 |
4.3604 | 3.6222 | 1.7751 | 40.1722 | 12 |
4.2940 | 3.5930 | 1.5683 | 49.0833 | 13 |
4.2214 | 3.5694 | 1.4980 | 45.8537 | 14 |
4.1482 | 3.5523 | 1.7241 | 44.6130 | 15 |
4.0861 | 3.5342 | 2.0267 | 39.1963 | 16 |
4.0126 | 3.5052 | 2.1943 | 40.3019 | 17 |
3.9553 | 3.4909 | 2.2466 | 42.2815 | 18 |
3.8973 | 3.4679 | 2.7648 | 34.0519 | 19 |
3.8397 | 3.4552 | 2.9543 | 38.1130 | 20 |
3.7934 | 3.4253 | 2.2250 | 48.2963 | 21 |
3.7257 | 3.4040 | 2.3509 | 45.2778 | 22 |
3.6643 | 3.3869 | 2.4464 | 46.6926 | 23 |
3.6209 | 3.3674 | 2.3087 | 48.7630 | 24 |
3.5603 | 3.3488 | 2.6122 | 43.2630 | 25 |
3.5105 | 3.3272 | 2.5595 | 46.8556 | 26 |
3.4645 | 3.3142 | 2.5999 | 47.6870 | 27 |
3.4201 | 3.3032 | 2.9416 | 45.0204 | 28 |
3.3736 | 3.2811 | 3.1105 | 43.2167 | 29 |
3.3306 | 3.2684 | 3.6797 | 41.7667 | 30 |
3.2658 | 3.2508 | 3.3509 | 49.2778 | 31 |
3.2222 | 3.2394 | 3.7258 | 44.6444 | 32 |
3.1719 | 3.2292 | 3.7031 | 45.4259 | 33 |
3.1219 | 3.2112 | 4.3785 | 38.5259 | 34 |
3.0806 | 3.2003 | 4.7949 | 38.2796 | 35 |
3.0467 | 3.1884 | 4.7402 | 39.1185 | 36 |
2.9929 | 3.1804 | 4.3355 | 42.3037 | 37 |
2.9471 | 3.1695 | 4.5699 | 41.0426 | 38 |
2.9078 | 3.1574 | 4.3787 | 44.4778 | 39 |
2.8716 | 3.1511 | 4.8370 | 39.7185 | 40 |
2.8198 | 3.1458 | 5.2962 | 35.9556 | 41 |
2.7842 | 3.1398 | 5.3283 | 38.1611 | 42 |
2.7432 | 3.1309 | 5.0445 | 38.8870 | 43 |
2.7043 | 3.1204 | 5.0695 | 40.3889 | 44 |
2.6696 | 3.1227 | 5.3477 | 43.0963 | 45 |
2.6259 | 3.1164 | 6.5346 | 35.9074 | 46 |
2.5843 | 3.1004 | 5.2047 | 44.6037 | 47 |
2.5448 | 3.1094 | 5.4300 | 37.8963 | 48 |
2.5129 | 3.0964 | 5.3997 | 41.5537 | 49 |
2.4697 | 3.0940 | 5.9519 | 38.7037 | 50 |
2.4409 | 3.0933 | 5.5973 | 41.1463 | 51 |
2.4077 | 3.0867 | 5.9751 | 40.1815 | 52 |
2.3705 | 3.0898 | 6.4699 | 36.8537 | 53 |
2.3397 | 3.0841 | 6.5144 | 38.2815 | 54 |
2.3118 | 3.0873 | 7.5425 | 33.0611 | 55 |
2.2658 | 3.0801 | 7.2862 | 35.8667 | 56 |
2.2407 | 3.0803 | 7.3595 | 34.6611 | 57 |
2.2027 | 3.0791 | 7.2377 | 35.7130 | 58 |
2.1805 | 3.0794 | 7.5672 | 35.1981 | 59 |
2.1504 | 3.0772 | 8.1746 | 33.9963 | 60 |
2.1208 | 3.0751 | 7.8803 | 35.0185 | 61 |
2.0915 | 3.0801 | 7.6175 | 37.2796 | 62 |
2.0573 | 3.0817 | 8.3303 | 33.8241 | 63 |
2.0379 | 3.0766 | 8.3032 | 33.95 | 64 |
Framework versions
- Transformers 4.33.3
- TensorFlow 2.14.0
- Datasets 2.14.5
- Tokenizers 0.13.3