generated_from_trainer

whisper-medium-ko-normalized-1273h

This model is a fine-tuned version of openai/whisper-medium on a custom dataset for improving Korean speech recognition. It achieves the following results on the evaluation set:

Model description

The model was a fine-tuned version of openai/whisper-medium transcript the Korean audio sources into text. It was trained on GCP's a2-highgpu-1g (a100-40G) for 26 hours with about $90.

Intended uses & limitations

This model was trained to extend the performance of the original whisper model for Korean transcription task.

Training and evaluation data

I downloaded all data from AI-HUB (https://aihub.or.kr/). Two datasets, in particular, caught my attention: "Instruction Audio Set" and "Noisy Conversation Audio Set". Following indicates the hours information for each dastset.

dataset name train_split (hours) validation_split (hours)
Instruction Audio Set 910 105
Noisy Conversation Audio Set 363 76

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Training results

Training Loss Epoch Step Validation Loss Wer
0.0588 1.0 8775 0.1225 0.0604
0.0287 2.0 17550 0.1186 0.0567
0.0148 3.0 26325 0.1254 0.0551

Framework versions

Evaluation Result for the dataset google/fleurs

The trained model is evaluated on the test split of subset ko_kr from the dataset google/fleurs. Please note that the model was not trained on the train split from the dataset.

model Wer
openai/whisper 0.2469
this model 0.2189