espnet audio automatic-speech-recognition spoken-language-understanding

ESPnet2: Meld Recipe

Demo: How to use in ESPnet2

cd espnet
pip install -e .
cd egs2/meld/asr1/
./run.sh

Environments

asr_train_asr_hubert_transformer_adam_specaug_meld_raw_en_bpe850

dataset Snt Emotion Classification (%)
decoder_asr_asr_model_valid.acc.ave_5best/test 2608 39.22
decoder_asr_asr_model_valid.acc.ave_5best/valid 1104 42.64

ASR results

WER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
decoder_asr_asr_model_valid.acc.ave_5best/test 2608 24809 55.5 28.0 16.5 8.4 52.9 96.5
decoder_asr_asr_model_valid.acc.ave_5best/valid 1104 10171 55.3 29.4 15.3 7.0 51.7 96.2

CER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
decoder_asr_asr_model_valid.acc.ave_5best/test 2608 120780 71.1 10.7 18.2 10.6 39.5 96.5
decoder_asr_asr_model_valid.acc.ave_5best/valid 1104 49323 71.3 11.1 17.6 9.4 38.1 96.2

TER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
decoder_asr_asr_model_valid.acc.ave_5best/test 2608 35287 57.6 21.8 20.5 7.8 50.2 96.5
decoder_asr_asr_model_valid.acc.ave_5best/valid 1104 14430 57.4 23.2 19.4 6.1 48.6 96.2