mt5-large-fce-e8-b16

This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.3526
Rouge1: 84.5329
Rouge2: 76.3656
Rougel: 83.9027
Rougelsum: 83.9238
Gen Len: 15.4614

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adafactor
lr_scheduler_type: linear
num_epochs: 8

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.2105	0.23	400	0.4344	84.6268	76.3447	84.0402	84.0182	15.4564
0.4664	0.45	800	0.4256	84.3821	75.6104	83.8113	83.8303	15.4404
0.434	0.68	1200	0.3839	84.0212	75.7319	83.4232	83.431	15.4952
0.406	0.9	1600	0.3713	84.7743	76.7805	84.2379	84.2352	15.4514
0.3193	1.13	2000	0.3665	84.634	76.5132	84.0604	84.0755	15.4774
0.2693	1.35	2400	0.3718	84.6587	76.7057	84.099	84.1045	15.4619
0.2815	1.58	2800	0.3617	84.5181	76.6792	83.9922	83.9976	15.4820
0.2776	1.81	3200	0.3526	84.5329	76.3656	83.9027	83.9238	15.4614
0.2551	2.03	3600	0.3720	84.504	76.6676	83.9957	84.0108	15.4801
0.1617	2.26	4000	0.3648	84.4385	76.3684	83.8585	83.8657	15.4897
0.1711	2.48	4400	0.3671	84.5241	76.6518	83.9862	83.9987	15.4902
0.1771	2.71	4800	0.3607	84.6437	76.6682	84.103	84.1174	15.4683
0.1803	2.93	5200	0.3582	84.479	76.6205	83.9509	83.9504	15.4715
0.1199	3.16	5600	0.3971	84.6367	76.7872	84.0191	84.0534	15.4715
0.1005	3.39	6000	0.4085	84.5153	76.6564	83.9365	83.9506	15.4820
0.1033	3.61	6400	0.4007	84.3191	76.399	83.8183	83.8142	15.4728
0.1067	3.84	6800	0.4014	84.5289	76.5335	83.9706	83.9967	15.4674
0.09	4.06	7200	0.4328	84.3978	76.6231	83.8654	83.8728	15.4783
0.0574	4.29	7600	0.4305	84.4476	76.7198	83.8943	83.9	15.4820
0.0579	4.51	8000	0.4510	84.5536	76.7635	83.977	83.9745	15.4719
0.061	4.74	8400	0.4447	84.5632	76.9892	84.0419	84.0501	15.4815
0.0608	4.97	8800	0.4353	84.6004	76.8883	84.0518	84.0596	15.4788
0.0362	5.19	9200	0.4853	84.7169	77.1321	84.1485	84.1486	15.4760
0.0333	5.42	9600	0.5053	84.851	77.4661	84.307	84.3106	15.4829
0.0325	5.64	10000	0.5066	84.7412	77.3031	84.2107	84.2006	15.4948
0.0335	5.87	10400	0.4947	84.7596	77.2636	84.2156	84.224	15.4906
0.0269	6.09	10800	0.5306	84.7484	77.2693	84.1824	84.1962	15.4811
0.0184	6.32	11200	0.5535	84.8066	77.3749	84.2765	84.2989	15.4756
0.0177	6.55	11600	0.5555	84.7335	77.2108	84.1917	84.2084	15.4865
0.0168	6.77	12000	0.5538	84.7053	77.2902	84.184	84.1929	15.4792
0.0165	7.0	12400	0.5614	84.7332	77.3098	84.2055	84.2055	15.4879
0.0092	7.22	12800	0.6222	84.7668	77.3059	84.2235	84.2397	15.4724
0.0086	7.45	13200	0.6485	84.8211	77.4247	84.2857	84.2996	15.4751
0.0098	7.67	13600	0.6417	84.7854	77.4226	84.2457	84.2652	15.4865
0.0088	7.9	14000	0.6445	84.7809	77.4171	84.2396	84.2591	15.4852

Framework versions

Transformers 4.28.1
Pytorch 1.11.0a0+b6df043
Datasets 2.12.0
Tokenizers 0.13.3

mt5-large-fce-e8-b16

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

NSDT 3DConvert

UnrealSynth

DreamTexture.js