Wav2Vec2 Large XLSR Galician

Description

This is a fine-tuned version of the facebook/wav2vec2-large-xlsr-53 pre-trained model for ASR in galician.

Dataset

The dataset used for fine-tuning this model was the OpenSLR galician dataset, available in the openslr repository.

Example inference script

Check this example script to run our model in inference mode

import torch
from transformers import AutoProcessor, AutoModelForCTC
filename = "demo.wav"  #change this line to the name of your audio file
sample_rate = 16_000   
processor = AutoProcessor.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
speech_array, _ = librosa.load(filename, sr=sample_rate)
inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt", padding=True).to(device)
with torch.no_grad():
  logits = model(inputs.input_values, attention_mask=inputs.attention_mask.to(device)).logits
decode_output = processor.batch_decode(torch.argmax(logits, dim=-1))[0]
print(f"ASR Galician wav2vec2-large-xlsr output: {decode_output}")

Fine-tuning hyper-parameters

Hyper-parameter	Value
Training batch size	16
Evaluation batch size	8
Learning rate	3e-4
Gradient accumulation steps	2
Group by length	true
Evaluation strategy	steps
Max training epochs	50
Max steps	4000
Generate max length	225
FP16	true
Metric for best model	wer
Greater is better	false

Fine-tuning in a different dataset or style

If you're interested in fine-tuning your own wav2vec2 model, we suggest starting with the facebook/wav2vec2-large-xlsr-53 model. Additionally, you may find this fine-tuning on galician notebook by Diego Fustes to be a valuable resource. This guide served as a helpful reference during the training process of this Galician wav2vec2-large-xlsr model!