Efficient Conformer v1 for non-streaming ASR

Specification: https://github.com/wenet-e2e/wenet/pull/1636

Aishell-1 Results

Feature info:
- using fbank feature, cmvn, speed perturb, dither
Training info:
- train_u2++_efficonformer_v1.yaml
- 8 gpu, batch size 16, acc_grad 1, 200 epochs
- lr 0.001, warmup_steps 25000
Model info:
- Model Params: 48,488,347
- Downsample rate: 1/4 (conv2d) * 1/2 (efficonformer block)
- encoder_dim 256, output_size 256, head 8, linear_units 2048
- num_blocks 12, cnn_module_kernel 15, group_size 3
Decoding info:
- ctc_weight 0.5, reverse_weight 0.3, average_num 20

decoding mode	full	18	16
attention decoder	4.99	5.13	5.16
ctc prefix beam search	4.98	5.23	5.23
attention rescoring	4.64	4.86	4.85

Start to Use

Install WeNet follow: https://wenet.org.cn/wenet/install.html#install-for-training

Decode

cd wenet/examples/aishell/s0
dir=exp/wenet_efficient_conformer_aishell_v1/

ctc_weight=0.5
reverse_weight=0.3
decoding_chunk_size=-1
mode="attention_rescoring"

test_dir=$dir/test_${mode}
mkdir -p $test_dir

# Decode
nohup python wenet/bin/recognize.py --gpu 0 \
    --mode $mode \
    --config $dir/train.yaml \
    --data_type "raw" \
    --test_data data/test/data.list \
    --checkpoint $dir/final.pt \
    --beam_size 10 \
    --batch_size 1 \
    --penalty 0.0 \
    --dict $dir/words.txt \
    --ctc_weight $ctc_weight \
    --reverse_weight $reverse_weight \
    --result_file $test_dir/text \
    ${decoding_chunk_size:+--decoding_chunk_size $decoding_chunk_size} > logs/decode_aishell.log &

# CER
python tools/compute-cer.py --char=1 --v=1 \
      data/test/text $test_dir/text > $test_dir/cer.txt

Efficient Conformer v1 for non-streaming ASR

Aishell-1 Results

Start to Use

NSDT 3DConvert

UnrealSynth

DreamTexture.js