hf-asr-leaderboard generated_from_trainer

<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->

Whisper Base for Korean Low quaiity Call Voices

This model is a fine-tuned version of openai/whisper-base on the Korean Low Quaiity Call Voices dataset. It achieves the following results on the evaluation set:

Model description

프로젝트 용도로 파인튜닝된 모델입니다. OpenAI의 Whisper-Base 모델을 바탕으로 '한국어 저음질 음성 통화 데이터'에 대한 정확도를 증가시키고자 파인튜닝을 진행한 모델이며, 사용한 데이터는 AI-HUB의 ‘저음질 전화망 음성인식 데이터’ 중 일부로서 오디오 파일 기준 240,771.06초(파일 1개당 평균 길이는 약 5.296초) 텍스트 데이터 기준 총 1,696,414글자의 크기입니다.

This is a fine-tuned model for project use. This model was fine-tuned to increase the accuracy of ‘Korean low-quality voice call data’ based on OpenAI’s Whisper-Base model. The data used is part of AI-HUB’s ‘low-quality telephone network voice recognition data’, which is 240,771.06 seconds based on audio files(average length per file is about 5.296 seconds). The total size is 1,696,414 characters based on text data.

Intended uses & limitations

파인튜닝에 사용된 Base model과 dataset 모두 학습 목적으로 사용하였으며, 따라서 본 모델 역시 학습 목적으로만 사용 가능합니다.

Both the base model and dataset used for fine tuning were used for learning purposes, so this model can also be used only for learning purposes.

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Training results

Training Loss Epoch Step Validation Loss Cer
0.6416 0.44 1000 0.6564 64.1489
0.5914 0.88 2000 0.5688 37.4957
0.435 1.32 3000 0.5349 32.6734
0.4056 1.76 4000 0.5124 30.9065
0.3368 2.2 5000 0.5057 32.6925
0.3107 2.64 6000 0.4979 32.8315
0.3016 3.08 7000 0.4947 29.3060
0.2979 3.52 8000 0.4941 30.7538

Framework versions