mt5-large-gramatika161k-b16-e10-lr0.001

This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.1537
Rouge1: 70.8264
Rouge2: 64.518
Rougel: 70.6934
Rougelsum: 70.6881
Gen Len: 18.3298

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adafactor
lr_scheduler_type: linear
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
0.3641	0.63	5000	0.1944	69.4204	61.9635	69.2556	69.2477	18.3389
0.1843	1.27	10000	0.1655	70.3343	63.6924	70.1851	70.1815	18.3377
0.1359	1.9	15000	0.1537	70.8264	64.518	70.6934	70.6881	18.3298
0.0912	2.54	20000	0.1643	71.037	64.8861	70.9075	70.9027	18.3295
0.0759	3.17	25000	0.1694	71.288	65.3505	71.1746	71.1675	18.3314
0.054	3.81	30000	0.1672	71.4356	65.5825	71.3263	71.3199	18.3294
0.0398	4.44	35000	0.1779	71.4473	65.6798	71.343	71.3354	18.3341
0.0331	5.08	40000	0.1908	71.615	65.9285	71.5126	71.4982	18.3344
0.021	5.71	45000	0.2025	71.6252	65.9628	71.5172	71.513	18.3317
0.0167	6.35	50000	0.2107	71.6508	66.0666	71.5547	71.542	18.3366
0.0126	6.98	55000	0.2084	71.8403	66.3396	71.7392	71.735	18.3337
0.0072	7.62	60000	0.2256	71.8659	66.388	71.7699	71.7644	18.3330
0.0057	8.25	65000	0.2578	71.9226	66.4948	71.8279	71.8162	18.3313
0.0036	8.88	70000	0.2784	71.9279	66.5248	71.8258	71.8149	18.3324
0.0021	9.52	75000	0.3040	71.9913	66.634	71.893	71.8844	18.3317

Framework versions

Transformers 4.30.1
Pytorch 1.11.0a0+b6df043
Datasets 2.12.0
Tokenizers 0.13.3