audio asr automatic-speech-recognition

Whisper for developers

<!-- Provide a quick summary of what the model is/does. -->

This model is a fine-tuned version of Whisper-large-v2 model specifically tuned for software developers. It transcribes words like 'ChatGPT' or 'Webhook' correctly, which previous Whisper models could not do.

Model Details

This model outperforms previous Whisper models in the transcription accuracy of software related words. I developed new metric to assess transcription accuracy of software related words, which is called DSWES (Domain-Specific Word Embedding Similarity). Further information about this metric will be provided in upcoming paper.

Please refer to the OpenAI Whisper model card for more details about the backbone model.

Model Description

<!-- Provide a longer summary of what this model is. -->

Model Sources

<!-- Provide the basic links for the model. -->

Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

Testing Data, Factors & Metrics

Testing Data

<!-- This should link to a Data Card if possible. --> Testing data consists of 1 hour of audio data manually recorded from AIWeek 2023, and 2 hours of audio data from developers conference video uploaded on YouTube. Note that testing data can not be provided publicly due to the privacy issue.

Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. --> Two of the most popular metrics to assess automatic speech recognition model, WER and CER, were used. <br> Additionally, DSWES was used to specifically check the transcription accuracy of softwared-related words. Note that higher the DSWES, the better.

For accessment, WhisperX was used as a backbone of a fine-tuned model due to its fast inference speed and reduced size. Since backbone of WhisperX is Whisper, I can safely assume that the performace of Whisper would very much similar to that of WhisperX.

Results

Models WER CER DSWES
WhisperX-large-v2 6.89 3.66 87
WhisperX-for-developers 6.56 2.84 91