<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
speller-t5-4
This model is a fine-tuned version of sberbank-ai/ruT5-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.1871
- Rouge1: 17.2619
- Rouge2: 7.5893
- Rougel: 17.5595
- Rougelsum: 17.5595
- Gen Len: 42.25
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
0.9773 | 0.04 | 500 | 0.5651 | 14.7321 | 5.2264 | 14.7863 | 14.8471 | 47.2321 |
0.8463 | 0.07 | 1000 | 0.4230 | 16.3628 | 5.6052 | 16.3158 | 16.4325 | 47.9018 |
0.6458 | 0.11 | 1500 | 0.3528 | 16.2099 | 5.5195 | 16.2034 | 16.3225 | 47.5179 |
0.6147 | 0.14 | 2000 | 0.3269 | 16.313 | 5.7216 | 16.313 | 16.4242 | 47.2232 |
0.5102 | 0.18 | 2500 | 0.3012 | 16.6071 | 6.0119 | 16.6239 | 16.5792 | 43.1696 |
0.4585 | 0.21 | 3000 | 0.2823 | 16.6295 | 6.0714 | 16.6741 | 16.6071 | 47.25 |
0.4801 | 0.25 | 3500 | 0.2748 | 16.8779 | 6.3885 | 16.8779 | 16.8779 | 44.5268 |
0.4721 | 0.29 | 4000 | 0.2605 | 17.1947 | 7.4353 | 17.3867 | 17.3867 | 42.7054 |
0.4132 | 0.32 | 4500 | 0.2530 | 17.2619 | 7.5605 | 17.5054 | 17.5054 | 42.9286 |
0.4255 | 0.36 | 5000 | 0.2495 | 17.1503 | 7.4107 | 17.3363 | 17.3363 | 42.5625 |
0.3952 | 0.39 | 5500 | 0.2424 | 17.2619 | 7.4702 | 17.4479 | 17.4479 | 42.5089 |
0.3229 | 0.43 | 6000 | 0.2354 | 17.2619 | 7.5605 | 17.5054 | 17.5054 | 44.0268 |
0.4474 | 0.47 | 6500 | 0.2310 | 17.2619 | 7.5335 | 17.4545 | 17.4545 | 42.5625 |
0.3736 | 0.5 | 7000 | 0.2300 | 17.2619 | 7.5335 | 17.4545 | 17.4545 | 42.4286 |
0.332 | 0.54 | 7500 | 0.2133 | 17.2619 | 7.5622 | 17.5085 | 17.5085 | 42.4732 |
0.3347 | 0.57 | 8000 | 0.2148 | 17.2619 | 7.5605 | 17.5054 | 17.5054 | 42.5 |
0.4257 | 0.61 | 8500 | 0.2093 | 17.2619 | 7.5605 | 17.5054 | 17.5054 | 42.3482 |
0.3072 | 0.64 | 9000 | 0.2009 | 17.2619 | 7.5893 | 17.5595 | 17.5595 | 42.3661 |
0.3184 | 0.68 | 9500 | 0.2028 | 17.2619 | 7.5893 | 17.5595 | 17.5595 | 42.4464 |
0.3013 | 0.72 | 10000 | 0.2083 | 17.2619 | 7.5893 | 17.5595 | 17.5595 | 42.2589 |
0.3202 | 0.75 | 10500 | 0.2056 | 17.2619 | 7.5893 | 17.5595 | 17.5595 | 42.4911 |
0.2689 | 0.79 | 11000 | 0.2020 | 17.2619 | 7.5893 | 17.5595 | 17.5595 | 42.8304 |
0.4168 | 0.82 | 11500 | 0.1962 | 17.2619 | 7.5893 | 17.5595 | 17.5595 | 42.2054 |
0.287 | 0.86 | 12000 | 0.1930 | 17.2619 | 7.5893 | 17.5595 | 17.5595 | 42.1875 |
0.3515 | 0.9 | 12500 | 0.1899 | 17.2619 | 7.5893 | 17.5595 | 17.5595 | 42.1875 |
0.2713 | 0.93 | 13000 | 0.1868 | 17.2619 | 7.5893 | 17.5595 | 17.5595 | 42.3304 |
0.2914 | 0.97 | 13500 | 0.1871 | 17.2619 | 7.5893 | 17.5595 | 17.5595 | 42.25 |
Framework versions
- Transformers 4.26.0
- Pytorch 1.13.1+cu116
- Datasets 2.9.0
- Tokenizers 0.13.2