Efficient Conformer v1 for non-streaming ASR
Specification: https://github.com/wenet-e2e/wenet/pull/1636
Results
- Feature info:
- using fbank feature, cmvn, speed perturb, dither
- Training info:
- train_u2++_efficonformer_v1.yaml
- 8 gpu, batch size 16, acc_grad 1, 120 epochs
- lr 0.001, warmup_steps 35000
- Model info:
- Model Params: 49,474,974
- Downsample rate: 1/4 (conv2d) * 1/2 (efficonformer block)
- encoder_dim 256, output_size 256, head 8, linear_units 2048
- num_blocks 12, cnn_module_kernel 15, group_size 3
- Decoding info:
- ctc_weight 0.5, reverse_weight 0.3, average_num 20
test clean
decoding mode | full | 18 | 16 |
---|---|---|---|
attention decoder | 3.65 | 3.88 | 3.87 |
ctc_greedy_search | 3.46 | 3.79 | 3.77 |
ctc prefix beam search | 3.44 | 3.75 | 3.74 |
attention rescoring | 3.17 | 3.44 | 3.41 |
test other
decoding mode | full | 18 | 16 |
---|---|---|---|
attention decoder | 8.51 | 9.24 | 9.25 |
ctc_greedy_search | 8.94 | 10.04 | 10.06 |
ctc prefix beam search | 8.91 | 10 | 10.01 |
attention rescoring | 8.21 | 9.25 | 9.25 |
Start to Use
Install WeNet follow: https://wenet.org.cn/wenet/install.html#install-for-training
Decode
cd examples/librispeech/s0
cp exp/wenet_efficient_conformer_librispeech_v1/decode.sh ./
cp exp/wenet_efficient_conformer_librispeech_v1/wer.sh ./
dir=exp/wenet_efficient_conformer_librispeech_v1
decoding_chunk_size=-1
. ./decode.sh ${dir} 20 ${decoding_chunk_size}
# WER
. ./wer.sh test_clean wenet_efficient_conformer_librispeech_v1 ${decoding_chunk_size}
. ./wer.sh test_other wenet_efficient_conformer_librispeech_v1 ${decoding_chunk_size}