<!-- This model card has been generated automatically according to the information Keras had access to. You should probably proofread and complete it, then remove this comment. -->
mt5-small-sum-fine-tuned
This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Train Loss: 2.4015
- Validation Loss: 1.8725
- Epoch: 74
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
- training_precision: float32
Training results
Train Loss | Validation Loss | Epoch |
---|---|---|
52.1786 | 49.3355 | 0 |
47.3638 | 45.1305 | 1 |
43.6563 | 42.4522 | 2 |
41.1214 | 39.5774 | 3 |
38.3601 | 37.3437 | 4 |
35.8017 | 34.8478 | 5 |
32.6174 | 32.5370 | 6 |
30.4399 | 30.7220 | 7 |
28.8299 | 29.1744 | 8 |
27.1342 | 26.7656 | 9 |
25.2765 | 24.9835 | 10 |
23.8467 | 23.1296 | 11 |
22.4239 | 21.5926 | 12 |
21.1438 | 20.8646 | 13 |
20.5646 | 21.1405 | 14 |
18.9753 | 20.3101 | 15 |
18.8306 | 19.6189 | 16 |
17.6935 | 18.5195 | 17 |
17.0993 | 17.4238 | 18 |
16.1595 | 16.1143 | 19 |
15.4946 | 15.2814 | 20 |
15.0521 | 14.1193 | 21 |
14.1677 | 13.0559 | 22 |
13.7239 | 12.5135 | 23 |
12.8212 | 11.2606 | 24 |
12.3333 | 10.5911 | 25 |
11.5663 | 9.7681 | 26 |
11.2357 | 9.7545 | 27 |
10.3757 | 8.6039 | 28 |
10.2910 | 8.3155 | 29 |
9.5480 | 7.9911 | 30 |
9.1881 | 7.5866 | 31 |
8.7798 | 7.2611 | 32 |
8.1529 | 6.9730 | 33 |
7.7057 | 6.6302 | 34 |
7.6724 | 6.2149 | 35 |
7.1820 | 5.9264 | 36 |
6.8348 | 5.9113 | 37 |
6.6185 | 5.7169 | 38 |
6.3897 | 5.2028 | 39 |
6.0808 | 4.8902 | 40 |
6.0517 | 4.5248 | 41 |
5.4217 | 4.1892 | 42 |
5.2464 | 4.1719 | 43 |
5.0986 | 4.1922 | 44 |
4.6939 | 3.9863 | 45 |
4.7763 | 3.7674 | 46 |
4.5684 | 3.4746 | 47 |
4.2996 | 3.1692 | 48 |
4.3434 | 3.0116 | 49 |
4.1290 | 2.9261 | 50 |
3.8491 | 2.8621 | 51 |
4.0837 | 2.7301 | 52 |
3.7118 | 2.6694 | 53 |
3.6294 | 2.6649 | 54 |
3.5421 | 2.6036 | 55 |
3.3884 | 2.8563 | 56 |
3.3752 | 2.4984 | 57 |
3.4596 | 2.4091 | 58 |
3.2075 | 2.4850 | 59 |
3.2646 | 2.3415 | 60 |
2.9473 | 2.3363 | 61 |
2.9364 | 2.2778 | 62 |
2.9130 | 2.2466 | 63 |
2.8123 | 2.1061 | 64 |
2.9697 | 2.1859 | 65 |
2.9565 | 2.0596 | 66 |
2.7610 | 2.2746 | 67 |
2.7636 | 2.2090 | 68 |
2.5776 | 2.0910 | 69 |
2.5245 | 1.9330 | 70 |
2.5848 | 1.9169 | 71 |
2.4724 | 1.8993 | 72 |
2.6195 | 1.8815 | 73 |
2.4015 | 1.8725 | 74 |
Framework versions
- Transformers 4.30.2
- TensorFlow 2.11.0
- Datasets 2.13.1
- Tokenizers 0.12.1