<!-- This model card has been generated automatically according to the information Keras had access to. You should probably proofread and complete it, then remove this comment. -->
nlewins/mt5-small-finetuned-ceb-to-en-tfC
This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Train Loss: 2.1293
- Validation Loss: 3.0935
- Train Bleu: 7.1603
- Train Gen Len: 38.0833
- Epoch: 52
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'module': 'keras.optimizers.schedules', 'class_name': 'ExponentialDecay', 'config': {'initial_learning_rate': 0.0001, 'decay_steps': 10000, 'decay_rate': 0.6, 'staircase': False, 'name': None}, 'registered_name': None}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.0001}
- training_precision: float32
Training results
Train Loss | Validation Loss | Train Bleu | Train Gen Len | Epoch |
---|---|---|---|---|
11.5985 | 5.6948 | 0.0191 | 94.4 | 0 |
7.0293 | 4.8430 | 0.0187 | 437.6333 | 1 |
6.1983 | 4.3404 | 0.1252 | 157.4593 | 2 |
5.7468 | 4.1384 | 0.1073 | 188.2611 | 3 |
5.4696 | 4.0143 | 0.2522 | 128.1685 | 4 |
5.2496 | 3.9493 | 0.4207 | 58.4056 | 5 |
5.0861 | 3.9033 | 0.6643 | 55.1963 | 6 |
4.9390 | 3.8395 | 0.8402 | 48.2630 | 7 |
4.8181 | 3.7961 | 0.8448 | 51.2093 | 8 |
4.6932 | 3.7547 | 1.0111 | 50.3444 | 9 |
4.5997 | 3.7109 | 1.1193 | 52.6833 | 10 |
4.4996 | 3.6721 | 1.5522 | 41.0556 | 11 |
4.4226 | 3.6487 | 1.4552 | 46.8333 | 12 |
4.3358 | 3.6152 | 1.3376 | 48.0481 | 13 |
4.2421 | 3.5866 | 1.3850 | 51.5259 | 14 |
4.1860 | 3.5723 | 1.5462 | 46.4537 | 15 |
4.1203 | 3.5448 | 1.5522 | 48.1296 | 16 |
4.0313 | 3.5147 | 1.6481 | 44.4315 | 17 |
3.9646 | 3.4952 | 2.2879 | 40.2852 | 18 |
3.9123 | 3.4736 | 2.3183 | 38.6111 | 19 |
3.8415 | 3.4556 | 2.6504 | 36.7796 | 20 |
3.7712 | 3.4287 | 2.7280 | 36.4593 | 21 |
3.7186 | 3.4077 | 2.8942 | 39.1444 | 22 |
3.6562 | 3.3831 | 2.7342 | 41.6463 | 23 |
3.5972 | 3.3603 | 2.9263 | 40.3741 | 24 |
3.5459 | 3.3485 | 3.2578 | 37.0963 | 25 |
3.4807 | 3.3243 | 2.9581 | 45.1667 | 26 |
3.4246 | 3.3066 | 3.3342 | 40.2370 | 27 |
3.3632 | 3.2872 | 3.4382 | 42.0907 | 28 |
3.3173 | 3.2673 | 3.1628 | 47.9148 | 29 |
3.2487 | 3.2448 | 3.8675 | 41.3833 | 30 |
3.1800 | 3.2346 | 3.5390 | 47.2537 | 31 |
3.1332 | 3.2188 | 4.0260 | 43.5833 | 32 |
3.0703 | 3.2035 | 4.0547 | 40.6056 | 33 |
3.0133 | 3.1855 | 4.4113 | 36.4204 | 34 |
2.9647 | 3.1737 | 4.7556 | 35.7796 | 35 |
2.9132 | 3.1639 | 4.9634 | 36.2907 | 36 |
2.8523 | 3.1525 | 4.7055 | 38.3315 | 37 |
2.8027 | 3.1374 | 5.0006 | 38.4667 | 38 |
2.7477 | 3.1346 | 5.0328 | 41.3852 | 39 |
2.6974 | 3.1275 | 5.2463 | 38.0778 | 40 |
2.6501 | 3.1155 | 5.1526 | 41.2815 | 41 |
2.6007 | 3.1098 | 4.5110 | 46.7796 | 42 |
2.5394 | 3.1112 | 5.1411 | 41.5352 | 43 |
2.4936 | 3.1068 | 5.9863 | 37.5852 | 44 |
2.4426 | 3.1000 | 6.4909 | 35.0426 | 45 |
2.4017 | 3.0977 | 5.7375 | 40.5222 | 46 |
2.3493 | 3.1010 | 7.1521 | 32.8944 | 47 |
2.3012 | 3.1005 | 7.3491 | 32.1815 | 48 |
2.2602 | 3.0898 | 7.1857 | 35.0759 | 49 |
2.2182 | 3.0909 | 6.9445 | 37.1426 | 50 |
2.1691 | 3.0938 | 7.4068 | 37.0333 | 51 |
2.1293 | 3.0935 | 7.1603 | 38.0833 | 52 |
Framework versions
- Transformers 4.33.3
- TensorFlow 2.14.0
- Datasets 2.14.5
- Tokenizers 0.13.3