FunASR FSMN-VAD

Introduce

Voice activity detection (VAD) plays a important role in speech recognition systems by detecting the beginning and end of effective speech. FunASR provides an efficient VAD model based on the FSMN structure. To improve model discrimination, we use monophones as modeling units, given the relatively rich speech information. During inference, the VAD system requires post-processing for improved robustness, including operations such as threshold settings and sliding windows.

This repository demonstrates how to leverage FSMN-VAD in conjunction with the funasr_onnx runtime. The underlying model is derived from FunASR, which was trained on a massive 5,000-hour dataset.

We have relesed numerous industrial-grade models, including speech recognition, voice activity detection, punctuation restoration, speaker verification, speaker diarization, and timestamp prediction (force alignment). To learn more about these models, kindly refer to the documentation available on FunASR. If you are interested in leveraging advanced AI technology for your speech-related projects, we invite you to explore the possibilities offered by FunASR.

Install funasr_onnx

pip install -U funasr_onnx
# For the users in China, you could install with the command:
# pip install -U funasr_onnx -i https://mirror.sjtu.edu.cn/pypi/web/simple

Download the model

git lfs install
git clone https://huggingface.co/funasr/FSMN-VAD

Inference with runtime

Voice Activity Detection

FSMN-VAD

from funasr_onnx import Fsmn_vad

model_dir = "./FSMN-VAD"
model = Fsmn_vad(model_dir, quantize=True)

wav_path = "./FSMN-VAD/asr_example.wav"

result = model(wav_path)
print(result)

Input: wav formt file, support formats: str, np.ndarray, List[str]

Output: List[str]: recognition result

Citations

@inproceedings{gao2022paraformer,
  title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
  author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
  booktitle={INTERSPEECH},
  year={2022}
}