Versions:
- CUDA: 12.1
- cuDNN Version: 8.9.2.26_1.0-1_amd64
- tensorflow Version: 2.12.0
- torch Version: 2.1.0.dev20230606+cu12135
- transformers Version: 4.30.2
- accelerate Version: 0.20.3
Model Benchmarks:
-
RAM: 2.8 GB (Original_Model: 5.5GB)
-
VRAM: 1812 MB (Original_Model: 6GB)
-
test.wav: 23 s (Multilingual Speech i.e. English+Hindi)
- Time in seconds for Processing by each device
Device Name float32 (Original) float16 CudaCores TensorCores 3060 1.7 1.1 3,584 112 1660 Super OOM 3.3 1,408 N/A Collab (Tesla T4) 2.8 2.2 2,560 320 Collab (CPU) 35 N/A N/A N/A M1 (CPU) - - - - M1 (GPU -> 'mps') - - - - - NOTE: TensorCores are efficient in mixed-precision calculations
- CPU -> torch.float16 not supported on CPU (AMD Ryzen 5 3600 or Collab CPU)
-
Punchuation: True
Model Error Benchmarks:
- WER: Word Error Rate
- MER: Match Error Rate
- WIL: Word Information Lost
- WIP: Word Information Preserved
- CER: Character Error Rate
Hindi to Hindi (test.tsv) Common Voice 14.0
Test done on RTX 3060 on 2557 Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model (54 min) | 52.02 | 47.86 | 66.82 | 33.17 | 23.76 |
This_Model (38 min) | 54.97 | 47.86 | 66.83 | 33.16 | 30.23 |
Hindi to English (test.csv) Custom Dataset
Test done on RTX 3060 on 1000 Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model (30 min) | - | - | - | - | - |
This_Model (20 min) | - | - | - | - | - |
English (LibriSpeech -> test-clean)
Test done on RTX 3060 on __ Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model | - | - | - | - | - |
This_Model | - | - | - | - | - |
English (LibriSpeech -> test-other)
Test done on RTX 3060 on __ Samples
WER | MER | WIL | WIP | CER | |
---|---|---|---|---|---|
Original_Model | - | - | - | - | - |
This_Model | - | - | - | - | - |
- 'jiwer' library is used for calculations
Code for conversion:
Usage
A file __init__.py
is contained inside this repo which contains all the code to use this model.
Firstly, clone this repo and place all the files inside a folder.
Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/devasheeshG/whisper_medium_fp16_transformers
Please try in jupyter notebook
# Import the Model
from whisper_medium_fp16_transformers import Model, load_audio, pad_or_trim
# Initilise the model
model = Model(
model_name_or_path='whisper_medium_fp16_transformers',
cuda_visible_device="0",
device='cuda',
)
# Load Audio
audio = load_audio('whisper_medium_fp16_transformers/test.wav')
audio = pad_or_trim(audio)
# Transcribe (First transcription takes time)
model.transcribe(audio)
Credits
It is fp16 version of openai/whisper-medium