trascript ASR wayuu

Model Background

This model has been trained on a unique dataset derived from parsed audio and textual data. It's important to note that the dataset originates from recordings and transcriptions of the Bible in Wayuunaiki. Due to proprietary restrictions, the dataset cannot be shared publicly.

Wayuunaiki is the native language of the Wayuu people, predominantly spoken by communities in Colombia and Venezuela. It's a part of the larger Arawakan language family. In the present day, there are a significant number of speakers in both Colombia and Venezuela, making it one of the more widely spoken indigenous languages in the region.

This model represents an initial endeavor in the journey of developing transcription models specifically for indigenous languages. The creation and improvement of such models have profound societal implications. It not only helps in preserving and promoting indigenous languages but also serves as a valuable asset for linguistic studies, helping scholars and communities alike in understanding and promoting the rich cultural tapestry of indigenous languages.

Training Dataset Details

The dataset consists of 1,835 audio recordings, each accompanied by its respective transcription. The lexical corpus encompasses approximately 3,000 unique words.

This collection of data serves as a foundational resource for understanding and processing the Wayuunaiki language.

The test dataset can be used under the principles of 'fair use' copyright.

Model Accuracy Warning

While this model has shown promising results, it's essential to be aware of its limitations:

Recommendation: Any transcription produced by this model should undergo subsequent validation and correction to ensure accuracy. This model is an excellent tool for initial drafts but must be used judiciously.

Test it yourself

Transcription Audio Link
iseeichi chi wayuu aneekünakai nütüma Maleiwa süpüla nuꞌutünajachin aaꞌin süpüla nülaꞌajaainjatüin saainjala wayuu süpüshua sainküin mmakat Listen here
maa akaapüꞌü tü anneerü oꞌutünapüꞌükat aaꞌin watüma wayakana judíokana shiiꞌiree sülaꞌajaanüin waainjala Listen here

The table provides sample transcriptions alongside their corresponding audio links. These examples give users an opportunity to listen to the audios and evaluate the transcription performance of the model firsthand. By exploring these samples, users can better understand the strengths and potential areas of refinement for the model, especially concerning specific nuances in the Wayuunaiki language.

Model Description

This model is a speech recognition system trained on a dataset to transcribe audio into text. The model underwent training for 4,000 steps, achieving remarkable improvements in loss metrics during its training journey.

Training Statistics

Validation Statistics (at the end of training)

Performance Metrics

The model demonstrated promising potential with a consistent reduction in the training loss and a competitive Word Error Rate (WER) during validation.