<!-- This model card has been generated automatically according to the information Keras had access to. You should probably proofread and complete it, then remove this comment. -->
nlewins/mt5-small-finetuned-ceb-to-en-tfE
This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Train Loss: 1.4749
- Validation Loss: 2.0609
- Train Bleu: 16.5015
- Train Gen Len: 21.3982
- Epoch: 41
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'module': 'keras.optimizers.schedules', 'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.0001, 'decay_steps': 10000, 'decay_rate': 0.6, 'staircase': False, 'name': None}, 'registered_name': None}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.0001}
- training_precision: float32
Training results
Train Loss | Validation Loss | Train Bleu | Train Gen Len | Epoch |
---|---|---|---|---|
9.4199 | 5.1899 | 0.0197 | 51.9264 | 0 |
6.1347 | 4.0369 | 0.0344 | 208.4342 | 1 |
5.2534 | 3.4687 | 0.2103 | 87.8201 | 2 |
4.7661 | 3.3253 | 0.3089 | 100.4775 | 3 |
4.4801 | 3.2114 | 0.5887 | 78.6664 | 4 |
4.2515 | 3.1183 | 0.6136 | 68.9984 | 5 |
4.0610 | 3.0319 | 0.7848 | 70.4391 | 6 |
3.8934 | 2.9630 | 1.2351 | 49.4546 | 7 |
3.7405 | 2.8792 | 2.4721 | 34.1030 | 8 |
3.5960 | 2.8110 | 2.6839 | 35.2633 | 9 |
3.4710 | 2.7374 | 3.5256 | 30.6239 | 10 |
3.3425 | 2.6677 | 4.1573 | 29.2339 | 11 |
3.2148 | 2.5964 | 4.2057 | 30.1938 | 12 |
3.0943 | 2.5397 | 5.1280 | 27.4652 | 13 |
2.9816 | 2.4890 | 5.4301 | 30.4930 | 14 |
2.8843 | 2.4337 | 6.6075 | 27.5552 | 15 |
2.7832 | 2.3834 | 7.5678 | 26.0294 | 16 |
2.6730 | 2.3427 | 8.2442 | 24.8504 | 17 |
2.5877 | 2.2995 | 9.3677 | 23.9534 | 18 |
2.4953 | 2.2676 | 9.0486 | 26.2592 | 19 |
2.4109 | 2.2402 | 10.5917 | 23.6231 | 20 |
2.3294 | 2.2128 | 12.2643 | 21.9452 | 21 |
2.2636 | 2.1925 | 11.0570 | 24.5511 | 22 |
2.1841 | 2.1664 | 11.8273 | 23.4448 | 23 |
2.1216 | 2.1502 | 11.4631 | 25.4056 | 24 |
2.0594 | 2.1347 | 12.6015 | 23.7539 | 25 |
2.0008 | 2.1240 | 13.0802 | 22.7931 | 26 |
1.9571 | 2.1087 | 13.2905 | 23.1415 | 27 |
1.9107 | 2.1003 | 14.0839 | 22.2584 | 28 |
1.8589 | 2.0905 | 14.4151 | 22.2625 | 29 |
1.8210 | 2.0827 | 14.4322 | 22.9141 | 30 |
1.7831 | 2.0754 | 15.1936 | 21.8291 | 31 |
1.7397 | 2.0680 | 15.1076 | 21.9632 | 32 |
1.7065 | 2.0675 | 15.2916 | 21.4464 | 33 |
1.6681 | 2.0653 | 15.9187 | 21.1913 | 34 |
1.6381 | 2.0618 | 15.8224 | 21.7204 | 35 |
1.6070 | 2.0602 | 16.1355 | 21.2862 | 36 |
1.5801 | 2.0557 | 15.5436 | 21.7768 | 37 |
1.5553 | 2.0459 | 16.4640 | 21.3205 | 38 |
1.5250 | 2.0560 | 16.9952 | 20.7694 | 39 |
1.5023 | 2.0567 | 16.5940 | 20.8201 | 40 |
1.4749 | 2.0609 | 16.5015 | 21.3982 | 41 |
Framework versions
- Transformers 4.33.3
- TensorFlow 2.14.0
- Datasets 2.14.5
- Tokenizers 0.13.3