wav2vec2-large-xls-r-300m-j-phoneme-colab-3

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice_10_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.6478
Wer: 0.3336

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 2
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 40
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	1.0	397	1.0586	0.9425
No log	2.0	794	0.5773	0.5847
1.9827	3.0	1191	0.5243	0.4882
1.9827	4.0	1588	0.4735	0.4624
1.9827	5.0	1985	0.4967	0.4789
0.6004	6.0	2382	0.4703	0.4246
0.6004	7.0	2779	0.4555	0.4194
0.4911	8.0	3176	0.4692	0.4284
0.4911	9.0	3573	0.4589	0.3997
0.4911	10.0	3970	0.4988	0.4286
0.4275	11.0	4367	0.4851	0.4153
0.4275	12.0	4764	0.5020	0.4039
0.3784	13.0	5161	0.5491	0.4169
0.3784	14.0	5558	0.5211	0.4080
0.3784	15.0	5955	0.5124	0.3950
0.3362	16.0	6352	0.5121	0.3909
0.3362	17.0	6749	0.5503	0.3728
0.3046	18.0	7146	0.5363	0.3915
0.3046	19.0	7543	0.6112	0.4076
0.3046	20.0	7940	0.5884	0.3755
0.2785	21.0	8337	0.5639	0.3793
0.2785	22.0	8734	0.6246	0.3742
0.2513	23.0	9131	0.6014	0.3714
0.2513	24.0	9528	0.6195	0.3697
0.2513	25.0	9925	0.6004	0.3729
0.2296	26.0	10322	0.5793	0.3585
0.2296	27.0	10719	0.6178	0.3628
0.2114	28.0	11116	0.5974	0.3507
0.2114	29.0	11513	0.6056	0.3432
0.2114	30.0	11910	0.6190	0.3536
0.1944	31.0	12307	0.6293	0.3550
0.1944	32.0	12704	0.6236	0.3535
0.1777	33.0	13101	0.6456	0.3503
0.1777	34.0	13498	0.6629	0.3444
0.1777	35.0	13895	0.6585	0.3432
0.1644	36.0	14292	0.6528	0.3455
0.1644	37.0	14689	0.6460	0.3437
0.1521	38.0	15086	0.6441	0.3360
0.1521	39.0	15483	0.6531	0.3350
0.1521	40.0	15880	0.6478	0.3336

Framework versions

Transformers 4.21.3
Pytorch 1.10.0+cu113
Datasets 2.4.0
Tokenizers 0.12.1

wav2vec2-large-xls-r-300m-j-phoneme-colab-3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

NSDT 3DConvert

UnrealSynth

DreamTexture.js