whisper-small-khmer

This model is a fine-tuned version of openai/whisper-small on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4657
Wer: 0.6464

Model description

This model is fine-tuned with Google FLEURS & OpenSLR (SLR42) dataset.

from transformers import pipeline

pipe = pipeline(
    task="automatic-speech-recognition",
    model="seanghay/whisper-small-khmer",
)

result = pipe("audio.wav",
  generate_kwargs={
    "language":"<|km|>",
    "task":"transcribe"},
    batch_size=16
)

print(result["text"])

whisper.cpp

1. Transcode the input audio to 16kHz PCM

ffmpeg -i audio.ogg -ar 16000 -ac 1 -c:a pcm_s16le output.wav

2. Transcribe with whisper.cpp

./main -m ggml-model.bin -f output.wav --print-colors --language km

Training and evaluation data

training = google/fleurs['train+validation'] + openslr['train']
eval = google/fleurs['test']

Training procedure

This model was trained based on the project on GitHub with an NVIDIA A10 24GB.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 6.25e-06
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 800
training_steps: 8000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.2065	3.37	1000	0.3403	0.7929
0.0446	6.73	2000	0.2911	0.6961
0.008	10.1	3000	0.3578	0.6627
0.003	13.47	4000	0.3982	0.6564
0.0012	16.84	5000	0.4287	0.6512
0.0004	20.2	6000	0.4499	0.6419
0.0001	23.57	7000	0.4614	0.6469
0.0001	26.94	8000	0.4657	0.6464

Framework versions

Transformers 4.28.0.dev0
Pytorch 2.0.0+cu117
Datasets 2.11.1.dev0
Tokenizers 0.13.3