pytorch audio speech automatic-speech-recognition whisper wav2vec2

Versions:

Model Benchmarks:

Model Error Benchmarks:

Hindi to Hindi (test.tsv) Common Voice 14.0

Test done on RTX 3060 on 1000 Samples

WER MER WIL WIP CER
Original_Model (30 min) 43.99 41.65 59.47 40.52 16.23
This_Model (20 min) 44.64 41.69 59.53 40.46 16.80

Hindi to English (test.csv) Custom Dataset

Test done on RTX 3060 on 1000 Samples

WER MER WIL WIP CER
Original_Model (30 min) - - - - -
This_Model (20 min) - - - - -

English (LibriSpeech -> test-clean)

Test done on RTX 3060 on ___ Samples

WER MER WIL WIP CER
Original_Model - - - - -
This_Model - - - - -

English (LibriSpeech -> test-other)

Test done on RTX 3060 on ___ Samples

WER MER WIL WIP CER
Original_Model - - - - -
This_Model - - - - -

Code for conversion:

Usage

A file __init__.py is contained inside this repo which contains all the code to use this model.

Firstly, clone this repo and place all the files inside a folder.

Make sure you have git-lfs installed (https://git-lfs.com)

git lfs install
git clone https://huggingface.co/devasheeshG/whisper_large_v2_fp16_transformers

Please try in jupyter notebook

# Import the Model
from whisper_large_v2_fp16_transformers import Model, load_audio, pad_or_trim
# Initilise the model
model = Model(
            model_name_or_path='whisper_large_v2_fp16_transformers',
            cuda_visible_device="0",
            device='cuda',
      )
# Load Audio
audio = load_audio('whisper_large_v2_fp16_transformers/test.wav')
audio = pad_or_trim(audio)
# Transcribe (First transcription takes time)
model.transcribe(audio)

Credits

It is fp16 version of openai/whisper-large-v2