whisper-large-v2-et-children

This model is a fine-tuned version of agnesluhtaru/whisper-large-et-ERR2020-v2 on an Estonian children's speech dataset.

More information about the model's performance and the data used for evaluation and training:

Luhtaru, Agnes; Jaaska, Rauno; Kruusamäe, Karl; Fishel, Mark (2023). Automatic Transcription for Estonian Children’s Speech. In: Proceedings of the 24th Nordic Conference on Computational Linguistics. https://openreview.net/forum?id=xbPTfBIUby

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 200
training_steps: 2000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.0302	4.03	500	0.2971	16.2892
0.0042	8.06	1000	0.3406	15.8551
0.0017	12.1	1500	0.3714	15.5585
0.0009	16.13	2000	0.3934	15.6445

Framework versions

Transformers 4.26.0.dev0
Pytorch 1.12.1+rocm5.1.1
Datasets 2.7.1.dev0
Tokenizers 0.13.2

whisper-large-v2-et-children

Training hyperparameters

Training results

Framework versions

NSDT 3DConvert

UnrealSynth

DreamTexture.js