Whisper Small Galician

Description

This is a fine-tuned version of the openai/whisper-small pre-trained model for ASR in galician.

Dataset

We used two datasets combined:

The OpenSLR galician dataset, available in the openslr repository.
The Common Voice 13 galician dataset, available in the Common Voice repository.

Example inference script

Check this example script to run our model in inference mode

import torch
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

filename = "demo.wav"  #change this line to the name of your audio file
sample_rate = 16_000   
processor = AutoProcessor.from_pretrained('ITG/whisper-small-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-small-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

with torch.no_grad():
  speech_array, _ = librosa.load(filename, sr=sample_rate)
  inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
  input_features = inputs.input_features
  generated_ids = model.generate(inputs=input_features, max_length=225)
  decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"ASR Galician whisper-small output: {decode_output}")

Fine-tuning hyper-parameters

Hyper-parameter	Value
Training batch size	16
Evaluation batch size	8
Learning rate	1e-5
Gradient checkpointing	true
Gradient accumulation steps	1
Max training epochs	100
Max steps	4000
Generate max length	225
Warmup training steps (%)	12,5%
FP16	true
Metric for best model	wer
Greater is better	false

Fine-tuning in a different dataset or style

If you're interested in fine-tuning your own whisper model, we suggest starting with the openai/whisper-small model. Additionally, you may find the Transformers step-by-step guide for fine-tuning whisper on multilingual ASR datasets to be a valuable resource. This guide served as a helpful reference during the training process of this Galician whisper-small model!