wav2vec2-xls-r-slavic-pomak
To train a Pomak ASR model, we fine-tuned a Slavic model (classla/wav2vec2-large-slavic-parlaspeech-hr) on 11h of recorded Pomak speech.
Recordings
Fours native Pomak speakers (2 female and 2 male) agreed to read Pomak texts at the ILSP audio-visual studio in Xanthi, Greece, resulting in a total of 14h.
Speaker | Gender | Total recorded hours |
---|---|---|
NK9dIF | F | 4h 44m 45s |
xoVY9q | M | 4h 36m 12s |
9G75fk | F | 1h 44m 03s |
n5WzHj | M | 3h 44m 04s |
To fine-tune the model, we split the long recordings into smaller segments of a maximum of 25 seconds each. This removed the majority of pauses and resulted in a total dataset duration of 11h 8m.
Metrics
The test set consists of 10% of the dataset recordings.
Model | CER | WER |
---|---|---|
pre-trained | 87.31% | 31.47% |
fine-tuned | 9.06% | 3.12% |
Training hyperparameters
To fine-tune the wav2vec2-large-slavic-parlaspeech-hr model, we used the following hyperparameters:
arg | value |
---|---|
per_device_train_batch_size |
8 |
gradient_accumulation_steps |
2 |
num_train_epochs |
35 |
learning_rate |
3e-4 |
warmup_steps |
500 |