Whisper Large V2 Portuguese 🇧🇷🇵🇹
Bem-vindo ao whisper large-v2 para transcrição em português 👋🏻
Transcribe Portuguese audio to text with the highest precision.
- Loss: 0.282
- Wer: 5.590
This model is a fine-tuned version of openai/whisper-large-v2 on the mozilla-foundation/common_voice_11 dataset. If you want a lighter model, you may be interested in jlondonobo/whisper-medium-pt. It achieves faster inference with almost no difference in WER.
Comparable models
Reported WER is based on the evaluation subset of Common Voice.
Model | WER | # Parameters |
---|---|---|
jlondonobo/whisper-large-v2-pt | 5.590 🤗 | 1550M |
openai/whisper-large-v2 | 6.300 | 1550M |
jlondonobo/whisper-medium-pt | 6.579 | 769M |
jonatasgrosman/wav2vec2-large-xlsr-53-portuguese | 11.310 | 317M |
Edresson/wav2vec2-large-xlsr-coraa-portuguese | 20.080 | 317M |
Training hyperparameters
We used the following hyperparameters for training:
learning_rate
: 1e-05train_batch_size
: 16eval_batch_size
: 8seed
: 42gradient_accumulation_steps
: 2total_train_batch_size
: 32optimizer
: Adam with betas=(0.9,0.999) and epsilon=1e-08lr_scheduler_type
: linearlr_scheduler_warmup_steps
: 500training_steps
: 5000mixed_precision_training
: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.0828 | 1.09 | 1000 | 0.1868 | 6.778 |
0.0241 | 3.07 | 2000 | 0.2057 | 6.109 |
0.0084 | 5.06 | 3000 | 0.2367 | 6.029 |
0.0015 | 7.04 | 4000 | 0.2469 | 5.709 |
0.0009 | 9.02 | 5000 | 0.2821 | 5.590 🤗 |
Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.7.1.dev0
- Tokenizers 0.13.2