<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
speller-t5-big-2
This model is a fine-tuned version of sberbank-ai/ruT5-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.1711
- Rouge1: 22.619
- Rouge2: 10.523
- Rougel: 22.619
- Rougelsum: 22.619
- Gen Len: 42.9107
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
1.244 | 0.04 | 500 | 0.5814 | 18.4902 | 6.4123 | 18.3883 | 18.5119 | 48.8214 |
0.6967 | 0.07 | 1000 | 0.4315 | 20.0 | 7.2173 | 20.0744 | 19.9702 | 47.0357 |
0.6362 | 0.11 | 1500 | 0.3721 | 21.1905 | 8.514 | 21.131 | 21.1607 | 47.3929 |
0.5561 | 0.14 | 2000 | 0.3265 | 22.0238 | 9.29 | 21.9643 | 21.994 | 45.6696 |
0.5094 | 0.18 | 2500 | 0.3049 | 22.0238 | 9.29 | 21.9643 | 21.994 | 46.0 |
0.429 | 0.21 | 3000 | 0.2858 | 22.0238 | 9.29 | 21.9643 | 21.994 | 44.9464 |
0.4557 | 0.25 | 3500 | 0.2696 | 22.1726 | 9.4388 | 22.0238 | 22.0982 | 45.2054 |
0.4268 | 0.29 | 4000 | 0.2565 | 22.1726 | 9.4388 | 22.0238 | 22.0982 | 44.5268 |
0.3955 | 0.32 | 4500 | 0.2480 | 22.1726 | 9.4388 | 22.0238 | 22.0982 | 44.2589 |
0.3672 | 0.36 | 5000 | 0.2387 | 22.619 | 10.523 | 22.619 | 22.619 | 44.2946 |
0.4059 | 0.39 | 5500 | 0.2268 | 22.619 | 10.523 | 22.619 | 22.619 | 44.1429 |
0.4005 | 0.43 | 6000 | 0.2216 | 22.619 | 10.523 | 22.619 | 22.619 | 44.4911 |
0.4176 | 0.47 | 6500 | 0.2187 | 22.619 | 10.523 | 22.619 | 22.619 | 44.1339 |
0.3413 | 0.5 | 7000 | 0.2115 | 22.619 | 10.523 | 22.619 | 22.619 | 43.9732 |
0.3618 | 0.54 | 7500 | 0.2068 | 22.619 | 10.523 | 22.619 | 22.619 | 43.9821 |
0.3157 | 0.57 | 8000 | 0.2037 | 22.619 | 10.523 | 22.619 | 22.619 | 43.0714 |
0.3502 | 0.61 | 8500 | 0.1956 | 22.619 | 10.523 | 22.619 | 22.619 | 42.8214 |
0.353 | 0.64 | 9000 | 0.1932 | 22.619 | 10.523 | 22.619 | 22.619 | 42.8393 |
0.3516 | 0.68 | 9500 | 0.1891 | 22.619 | 10.523 | 22.619 | 22.619 | 42.2589 |
0.3225 | 0.72 | 10000 | 0.1836 | 22.619 | 10.523 | 22.619 | 22.619 | 42.1964 |
0.2993 | 0.75 | 10500 | 0.1818 | 22.619 | 10.523 | 22.619 | 22.619 | 43.6607 |
0.3353 | 0.79 | 11000 | 0.1814 | 22.619 | 10.523 | 22.619 | 22.619 | 42.4018 |
0.3325 | 0.82 | 11500 | 0.1807 | 22.619 | 10.523 | 22.619 | 22.619 | 43.1786 |
0.3181 | 0.86 | 12000 | 0.1752 | 22.619 | 10.523 | 22.619 | 22.619 | 43.25 |
0.3337 | 0.9 | 12500 | 0.1729 | 22.619 | 10.523 | 22.619 | 22.619 | 42.3929 |
0.281 | 0.93 | 13000 | 0.1737 | 22.619 | 10.523 | 22.619 | 22.619 | 43.8214 |
0.45 | 0.97 | 13500 | 0.1711 | 22.619 | 10.523 | 22.619 | 22.619 | 42.9107 |
Framework versions
- Transformers 4.26.0
- Pytorch 1.13.1+cu116
- Datasets 2.9.0
- Tokenizers 0.13.2