espnet audio speech-recognition openai-whisper

ESPnet2 ASR model

espnet/shihlun_asr_whisper_medium_finetuned_librispeech100

This model was trained by Shih-Lun Wu (slseanwu) using the librispeech_100 recipe in espnet.

Demo: How to use in ESPnet2

cd espnet
pip install -e .
cd egs2/librispeech_100/asr1

train_set="train_clean_100"
valid_set="dev"
test_sets="test_clean test_other dev_clean dev_other"
asr_tag=whisper_medium_finetune_lr1e-5_adamw_wd1e-2_3epochs
asr_config=conf/tuning/train_asr_whisper_full.yaml
inference_config=conf/decode_asr_whisper_noctc_greedy.yaml

./asr.sh \
    --skip_data_prep false \
    --skip_train true \
    --skip_eval false \
    --lang en \
    --ngpu 1 \
    --nj 4 \
    --stage 1 \
    --stop_stage 13 \
    --gpu_inference true \
    --inference_nj 1 \
    --token_type whisper_multilingual \
    --feats_normalize '' \
    --max_wav_duration 30 \
    --speed_perturb_factors "0.9 1.0 1.1" \
    --audio_format "flac.ark" \
    --feats_type raw \
    --use_lm false \
    --cleaner whisper_en \
    --asr_tag "${asr_tag}" \
    --asr_config "${asr_config}" \
    --inference_config "${inference_config}" \
    --inference_asr_model valid.acc.ave.pth \
    --train_set "${train_set}" \
    --valid_set "${valid_set}" \
    --test_sets "${test_sets}" "$@"

<!-- Generated by scripts/utils/show_asr_result.sh -->

RESULTS

Environments

asr_whisper_medium_finetune_lr1e-5_adamw_wd1e-2_3epochs

WER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_clean 2703 54798 97.7 1.9 0.3 0.3 2.6 30.1
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_other 2864 51528 95.3 4.3 0.4 0.6 5.3 45.4
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_clean 2620 53027 97.6 2.1 0.3 0.4 2.7 30.9
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_other 2939 52882 95.1 4.4 0.5 0.7 5.6 47.5

CER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_clean 2703 287287 99.3 0.3 0.4 0.3 1.0 30.1
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_other 2864 265648 98.3 1.0 0.7 0.6 2.3 45.4
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_clean 2620 280691 99.3 0.3 0.3 0.3 1.0 30.9
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_other 2939 271738 98.3 1.0 0.7 0.7 2.4 47.5