Whisper for developers

This model is a fine-tuned version of Whisper-large-v2 model specifically tuned for software developers. It transcribes words like 'ChatGPT' or 'Webhook' correctly, which previous Whisper models could not do.

Model Details

This model outperforms previous Whisper models in the transcription accuracy of software related words. I developed new metric to assess transcription accuracy of software related words, which is called DSWES (Domain-Specific Word Embedding Similarity). Further information about this metric will be provided in upcoming paper.

Please refer to the OpenAI Whisper model card for more details about the backbone model.

Model Description

Developed by: yongchanskii
Shared by: yongchanskii
Model type: Whisper
Language(s): Korean, English
License: Attribution-NonCommercial 3.0 Unported
Finetuned from model: openai/whisper-large-v2

Model Sources

Repository: cyc9805
Paper: Coming soon

Evaluation

Testing Data, Factors & Metrics

Testing Data

Testing data consists of 1 hour of audio data manually recorded from AIWeek 2023, and 2 hours of audio data from developers conference video uploaded on YouTube. Note that testing data can not be provided publicly due to the privacy issue.

Metrics

Two of the most popular metrics to assess automatic speech recognition model, WER and CER, were used. <br> Additionally, DSWES was used to specifically check the transcription accuracy of softwared-related words. Note that higher the DSWES, the better.

For accessment, WhisperX was used as a backbone of a fine-tuned model due to its fast inference speed and reduced size. Since backbone of WhisperX is Whisper, I can safely assume that the performace of Whisper would very much similar to that of WhisperX.

Results

Models	WER	CER	DSWES
WhisperX-large-v2	6.89	3.66	87
WhisperX-for-developers	6.56	2.84	91