Whisper Small Galician
Description
This is a fine-tuned version of the openai/whisper-small pre-trained model for ASR in galician.
Dataset
We used two datasets combined:
- The OpenSLR galician dataset, available in the openslr repository.
- The Common Voice 13 galician dataset, available in the Common Voice repository.
Example inference script
Check this example script to run our model in inference mode
import torch
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
filename = "demo.wav" #change this line to the name of your audio file
sample_rate = 16_000
processor = AutoProcessor.from_pretrained('ITG/whisper-small-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-small-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
with torch.no_grad():
speech_array, _ = librosa.load(filename, sr=sample_rate)
inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
input_features = inputs.input_features
generated_ids = model.generate(inputs=input_features, max_length=225)
decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"ASR Galician whisper-small output: {decode_output}")
Fine-tuning hyper-parameters
Hyper-parameter | Value |
---|---|
Training batch size | 16 |
Evaluation batch size | 8 |
Learning rate | 1e-5 |
Gradient checkpointing | true |
Gradient accumulation steps | 1 |
Max training epochs | 100 |
Max steps | 4000 |
Generate max length | 225 |
Warmup training steps (%) | 12,5% |
FP16 | true |
Metric for best model | wer |
Greater is better | false |
Fine-tuning in a different dataset or style
If you're interested in fine-tuning your own whisper model, we suggest starting with the openai/whisper-small model. Additionally, you may find the Transformers step-by-step guide for fine-tuning whisper on multilingual ASR datasets to be a valuable resource. This guide served as a helpful reference during the training process of this Galician whisper-small model!