SpeechT5-it

This model is a fine-tuned version of microsoft/speecht5_tts on the VOXPOPULI dataset. It achieves the following results on the evaluation set:

Loss: 0.4600

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
num_epochs: 40

Training results

Training Loss	Epoch	Step	Validation Loss
0.5641	1.0	712	0.5090
0.5394	2.0	1424	0.4915
0.5277	3.0	2136	0.4819
0.5136	4.0	2848	0.4798
0.5109	5.0	3560	0.4733
0.5078	6.0	4272	0.4731
0.5033	7.0	4984	0.4692
0.5021	8.0	5696	0.4691
0.4984	9.0	6408	0.4670
0.488	10.0	7120	0.4641
0.491	11.0	7832	0.4641
0.4918	12.0	8544	0.4647
0.4933	13.0	9256	0.4622
0.499	14.0	9968	0.4619
0.4906	15.0	10680	0.4608
0.4884	16.0	11392	0.4622
0.4847	17.0	12104	0.4616
0.4916	18.0	12816	0.4592
0.4845	19.0	13528	0.4600
0.4788	20.0	14240	0.4594
0.4746	21.0	14952	0.4607
0.4875	22.0	15664	0.4615
0.4831	23.0	16376	0.4597
0.4798	24.0	17088	0.4595
0.4727	25.0	17800	0.4592
0.4736	26.0	18512	0.4598
0.4746	27.0	19224	0.4608
0.4728	28.0	19936	0.4589
0.4771	29.0	20648	0.4593
0.4743	30.0	21360	0.4588
0.4785	31.0	22072	0.4601
0.4757	32.0	22784	0.4597
0.4731	33.0	23496	0.4598
0.4746	34.0	24208	0.4593
0.4715	35.0	24920	0.4599
0.4769	36.0	25632	0.4622
0.4778	37.0	26344	0.4605
0.4798	38.0	27056	0.4594
0.4694	39.0	27768	0.4607
0.468	40.0	28480	0.4600

Framework versions

Transformers 4.30.0.dev0
Pytorch 2.0.1+cu117
Datasets 2.13.1
Tokenizers 0.13.3