<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
mt5-large-gramatika161k-b16-5000
This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0949
- Rouge1: 72.227
- Rouge2: 67.1468
- Rougel: 72.1408
- Rougelsum: 72.1494
- Gen Len: 18.3283
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
0.684 | 0.63 | 5000 | 0.1422 | 70.2446 | 63.7161 | 70.115 | 70.1185 | 18.3370 |
0.1704 | 1.27 | 10000 | 0.1185 | 71.1601 | 65.3066 | 71.0354 | 71.041 | 18.3348 |
0.1383 | 1.9 | 15000 | 0.1079 | 71.5399 | 65.9422 | 71.4296 | 71.4371 | 18.3289 |
0.1166 | 2.54 | 20000 | 0.1032 | 71.8281 | 66.4753 | 71.7248 | 71.7321 | 18.3303 |
0.106 | 3.17 | 25000 | 0.0983 | 72.0264 | 66.8201 | 71.9367 | 71.9427 | 18.3291 |
0.0952 | 3.81 | 30000 | 0.0962 | 72.1134 | 66.9793 | 72.0288 | 72.0362 | 18.3297 |
0.0891 | 4.44 | 35000 | 0.0949 | 72.227 | 67.1468 | 72.1408 | 72.1494 | 18.3283 |
Framework versions
- Transformers 4.30.1
- Pytorch 1.11.0a0+b6df043
- Datasets 2.12.0
- Tokenizers 0.13.3