<!-- This model card has been generated automatically according to the information Keras had access to. You should probably proofread and complete it, then remove this comment. -->
nlewins/mt5-small-finetuned-ceb-to-en-tfZ
This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Train Loss: 2.3206
- Validation Loss: 3.1352
- Train Bleu: 6.9732
- Train Gen Len: 33.6093
- Epoch: 39
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 1e-04, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: float32
Training results
| Train Loss | Validation Loss | Train Bleu | Train Gen Len | Epoch |
|---|---|---|---|---|
| 8.3663 | 4.9326 | 0.0133 | 508.5833 | 0 |
| 6.4712 | 4.3528 | 0.0497 | 197.7093 | 1 |
| 5.8897 | 4.1185 | 0.2051 | 122.2426 | 2 |
| 5.5178 | 3.9618 | 0.3938 | 95.8593 | 3 |
| 5.2559 | 3.8786 | 0.6506 | 57.1111 | 4 |
| 5.0615 | 3.8403 | 0.5691 | 51.1630 | 5 |
| 4.9009 | 3.7958 | 0.8084 | 39.2685 | 6 |
| 4.7622 | 3.7542 | 0.8647 | 51.6370 | 7 |
| 4.6515 | 3.7048 | 0.9399 | 55.9722 | 8 |
| 4.5334 | 3.6699 | 1.1787 | 52.1241 | 9 |
| 4.4349 | 3.6267 | 1.5369 | 43.9111 | 10 |
| 4.3203 | 3.5959 | 1.5818 | 40.9870 | 11 |
| 4.2457 | 3.5649 | 1.4775 | 50.0574 | 12 |
| 4.1571 | 3.5386 | 1.4422 | 54.0852 | 13 |
| 4.0728 | 3.5110 | 1.7008 | 48.4907 | 14 |
| 3.9757 | 3.4783 | 1.9308 | 45.6537 | 15 |
| 3.9063 | 3.4515 | 2.3652 | 41.0037 | 16 |
| 3.8218 | 3.4235 | 2.7974 | 41.2296 | 17 |
| 3.7604 | 3.3947 | 2.4012 | 43.0667 | 18 |
| 3.6770 | 3.3680 | 2.9153 | 39.3056 | 19 |
| 3.6081 | 3.3500 | 3.2060 | 40.7574 | 20 |
| 3.5273 | 3.3267 | 3.0406 | 41.6926 | 21 |
| 3.4594 | 3.3039 | 3.0793 | 40.8852 | 22 |
| 3.3859 | 3.2710 | 2.6225 | 53.7741 | 23 |
| 3.3230 | 3.2544 | 3.0726 | 46.8148 | 24 |
| 3.2496 | 3.2400 | 4.0928 | 38.8981 | 25 |
| 3.1771 | 3.2242 | 4.2043 | 39.2056 | 26 |
| 3.1167 | 3.2031 | 3.9126 | 42.8056 | 27 |
| 3.0427 | 3.1927 | 4.2052 | 39.0981 | 28 |
| 2.9833 | 3.1833 | 5.0329 | 34.4593 | 29 |
| 2.9094 | 3.1710 | 4.7041 | 37.4611 | 30 |
| 2.8395 | 3.1557 | 5.6423 | 35.4833 | 31 |
| 2.7686 | 3.1503 | 5.3470 | 36.2185 | 32 |
| 2.7053 | 3.1500 | 5.5612 | 36.6574 | 33 |
| 2.6525 | 3.1370 | 5.8202 | 35.2241 | 34 |
| 2.5782 | 3.1341 | 5.7995 | 38.1556 | 35 |
| 2.5070 | 3.1295 | 6.8106 | 32.2759 | 36 |
| 2.4419 | 3.1287 | 6.8896 | 34.6815 | 37 |
| 2.3756 | 3.1295 | 6.8859 | 36.5444 | 38 |
| 2.3206 | 3.1352 | 6.9732 | 33.6093 | 39 |
Framework versions
- Transformers 4.33.3
- TensorFlow 2.14.0
- Datasets 2.14.5
- Tokenizers 0.13.3