Whisper Base for Korean Low quaiity Call Voices

This model is a fine-tuned version of openai/whisper-base on the Korean Low Quaiity Call Voices dataset. It achieves the following results on the evaluation set:

Loss: 0.4941
Cer: 30.7538

Model description

프로젝트 용도로 파인튜닝된 모델입니다. OpenAI의 Whisper-Base 모델을 바탕으로 '한국어 저음질 음성 통화 데이터'에 대한 정확도를 증가시키고자 파인튜닝을 진행한 모델이며, 사용한 데이터는 AI-HUB의 ‘저음질 전화망 음성인식 데이터’ 중 일부로서 오디오 파일 기준 240,771.06초(파일 1개당 평균 길이는 약 5.296초) 텍스트 데이터 기준 총 1,696,414글자의 크기입니다.

This is a fine-tuned model for project use. This model was fine-tuned to increase the accuracy of ‘Korean low-quality voice call data’ based on OpenAI’s Whisper-Base model. The data used is part of AI-HUB’s ‘low-quality telephone network voice recognition data’, which is 240,771.06 seconds based on audio files(average length per file is about 5.296 seconds). The total size is 1,696,414 characters based on text data.

Intended uses & limitations

파인튜닝에 사용된 Base model과 dataset 모두 학습 목적으로 사용하였으며, 따라서 본 모델 역시 학습 목적으로만 사용 가능합니다.

Both the base model and dataset used for fine tuning were used for learning purposes, so this model can also be used only for learning purposes.

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 8000

Training results

Training Loss	Epoch	Step	Validation Loss	Cer
0.6416	0.44	1000	0.6564	64.1489
0.5914	0.88	2000	0.5688	37.4957
0.435	1.32	3000	0.5349	32.6734
0.4056	1.76	4000	0.5124	30.9065
0.3368	2.2	5000	0.5057	32.6925
0.3107	2.64	6000	0.4979	32.8315
0.3016	3.08	7000	0.4947	29.3060
0.2979	3.52	8000	0.4941	30.7538

Framework versions

Transformers 4.34.0.dev0
Pytorch 2.0.1+cu118
Datasets 2.14.5
Tokenizers 0.13.3