hf-asr-leaderboard generated_from_trainer

<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->

whisper-small-khmer

This model is a fine-tuned version of openai/whisper-small on the None dataset. It achieves the following results on the evaluation set:

Model description

This model is fine-tuned with Google FLEURS & OpenSLR (SLR42) dataset.

from transformers import pipeline

pipe = pipeline(
    task="automatic-speech-recognition",
    model="seanghay/whisper-small-khmer",
)

result = pipe("audio.wav",
  generate_kwargs={
    "language":"<|km|>",
    "task":"transcribe"},
    batch_size=16
)

print(result["text"])

whisper.cpp

1. Transcode the input audio to 16kHz PCM

ffmpeg -i audio.ogg -ar 16000 -ac 1 -c:a pcm_s16le output.wav

2. Transcribe with whisper.cpp

./main -m ggml-model.bin -f output.wav --print-colors --language km

Training and evaluation data

Training procedure

This model was trained based on the project on GitHub with an NVIDIA A10 24GB.

Training hyperparameters

The following hyperparameters were used during training:

Training results

Training Loss Epoch Step Validation Loss Wer
0.2065 3.37 1000 0.3403 0.7929
0.0446 6.73 2000 0.2911 0.6961
0.008 10.1 3000 0.3578 0.6627
0.003 13.47 4000 0.3982 0.6564
0.0012 16.84 5000 0.4287 0.6512
0.0004 20.2 6000 0.4499 0.6419
0.0001 23.57 7000 0.4614 0.6469
0.0001 26.94 8000 0.4657 0.6464

Framework versions