Macedonian Wav2Vec2 Model

This repository contains a Wav2Vec2 model trained and fine-tuned on a custom dataset for the Macedonian language. The model is part of the HuggingFace library, a popular open-source library for natural language processing.

Model Details

The Wav2Vec2 model is a state-of-the-art automatic speech recognition (ASR) model that converts spoken language into written text. This particular model has been trained and fine-tuned specifically for the Macedonian language, making it highly accurate and suitable for various speech-to-text applications in Macedonian. How to Use Installation

You can install the required dependencies by using pip, the Python package installer:

pip install transformers

Usage Example

Here's a simple example demonstrating how to use the Macedonian Wav2Vec2 model for speech recognition:


from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer

Load the model and tokenizer

model = Wav2Vec2ForCTC.from_pretrained("Konstantin-Bogdanoski/macedonian-wav2vec2-base")
tokenizer = Wav2Vec2Tokenizer.from_pretrained("Konstantin-Bogdanoski/macedonian-wav2vec2-base")

Perform speech recognition on an audio file

file_path = "path/to/audio.wav"
input_audio = tokenizer(file_path, return_tensors="pt").input_values
transcription = model(input_audio).logits.argmax(-1)

Convert token IDs to text

transcription_text = tokenizer.decode(transcription[0])
print("Transcription:", transcription_text)

Make sure to replace "path/to/audio.wav" with the actual path to your audio file. The transcription_text variable will contain the recognized text from the speech. Model Fine-Tuning

If you're interested in fine-tuning the Wav2Vec2 model on your own custom dataset for the Macedonian language, you can refer to the HuggingFace documentation.

Acknowledgments

We would like to acknowledge the creators of the Wav2Vec2 model and the HuggingFace library for their valuable contributions to the field of automatic speech recognition.

If you have any questions or encounter any issues, please feel free to open an issue in this repository. We are here to help!

Citation

To cite this model and research paper, use the following BiBTex:

@InProceedings{10.1007/978-3-031-39059-3_17,
author="Bogdanoski, Konstantin
and Mishev, Kostadin
and Simjanoska, Monika
and Trajanov, Dimitar",
editor="Conte, Donatello
and Fred, Ana
and Gusikhin, Oleg
and Sansone, Carlo",
title="Exploring ASR Models in Low-Resource Languages: Use-Case the Macedonian Language",
booktitle="Deep Learning Theory and Applications",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="254--268",
abstract="We explore the use of Wav2Vec 2.0, NeMo, and ESPNet models trained on a dataset in Macedonian language for the development of Automatic Speech Recognition (ASR) models for low-resource languages. The study aims to evaluate the performance of recent state-of-the-art models for speech recognition in low-resource languages, such as Macedonian, where there are limited resources available for training or fine-tuning. The paper presents a methodology used for data collection and preprocessing, as well as the details of the three architectures used in the study. The study evaluates the performance of each model using WER and CER metrics and provides a comparative analysis of the results. The findings of the research showed that Wav2Vec 2.0 outperformed the other models for the Macedonian language with a WER of 0.21, and CER of 0.09, however, NeMo and ESPNet models are still good candidates for creating ASR tools for low-resource languages such as Macedonian. The research presented provides insights into the effectiveness of different models for ASR in low-resource languages and highlights the potentials for using these models to develop ASR tools for other languages in the future. These findings have significant implications for the development of ASR tools for other low-resource languages in the future, and can potentially improve accessibility to speech recognition technology for individuals and communities who speak these languages.",
isbn="978-3-031-39059-3"
}