Wav2Vec2 Large XLSR Galician
Description
This is a fine-tuned version of the facebook/wav2vec2-large-xlsr-53 pre-trained model for ASR in galician.
Dataset
The dataset used for fine-tuning this model was the OpenSLR galician dataset, available in the openslr repository.
Example inference script
Check this example script to run our model in inference mode
import torch
from transformers import AutoProcessor, AutoModelForCTC
filename = "demo.wav" #change this line to the name of your audio file
sample_rate = 16_000
processor = AutoProcessor.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
speech_array, _ = librosa.load(filename, sr=sample_rate)
inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt", padding=True).to(device)
with torch.no_grad():
logits = model(inputs.input_values, attention_mask=inputs.attention_mask.to(device)).logits
decode_output = processor.batch_decode(torch.argmax(logits, dim=-1))[0]
print(f"ASR Galician wav2vec2-large-xlsr output: {decode_output}")
Fine-tuning hyper-parameters
Hyper-parameter | Value |
---|---|
Training batch size | 16 |
Evaluation batch size | 8 |
Learning rate | 3e-4 |
Gradient accumulation steps | 2 |
Group by length | true |
Evaluation strategy | steps |
Max training epochs | 50 |
Max steps | 4000 |
Generate max length | 225 |
FP16 | true |
Metric for best model | wer |
Greater is better | false |
Fine-tuning in a different dataset or style
If you're interested in fine-tuning your own wav2vec2 model, we suggest starting with the facebook/wav2vec2-large-xlsr-53 model. Additionally, you may find this fine-tuning on galician notebook by Diego Fustes to be a valuable resource. This guide served as a helpful reference during the training process of this Galician wav2vec2-large-xlsr model!