Whisper Kannada Base

This model is a fine-tuned version of openai/whisper-base on the Kannada data available from multiple publicly available ASR corpuses. It has been fine-tuned as a part of the Whisper fine-tuning sprint.

NOTE: The code used to train this model is available for re-use in the whisper-finetune repository.

Usage

In order to evaluate this model on an entire dataset, the evaluation codes available in the whisper-finetune repository can be used.

The same repository also provides the scripts for faster inference using whisper-jax.

In order to infer a single audio file using this model, the following code snippet can be used:

>>> import torch
>>> from transformers import pipeline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"

>>> transcribe = pipeline(task="automatic-speech-recognition", model="vasista22/whisper-kannada-base", chunk_length_s=30, device=device)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="kn", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

For faster inference of whisper models, the whisper-jax library can be used. Please follow the necessary installation steps as mentioned here, before using the following code snippet:

>>> import jax.numpy as jnp
>>> from whisper_jax import FlaxWhisperForConditionalGeneration, FlaxWhisperPipline

>>> # path to the audio file to be transcribed
>>> audio = "/path/to/audio.format"

>>> transcribe = FlaxWhisperPipline("vasista22/whisper-kannada-tiny", batch_size=16)
>>> transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="kn", task="transcribe")

>>> print('Transcription: ', transcribe(audio)["text"])

Training and evaluation data

Training Data:

Evaluation Data:

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3.3e-05
train_batch_size: 80
eval_batch_size: 88
seed: 22
optimizer: adamw_bnb_8bit
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 10000
training_steps: 10320 (terminated upon convergence. Initially set to 51570 steps)
mixed_precision_training: True

Acknowledgement

This work was done at Speech Lab, IIT Madras.

The compute resources for this work were funded by "Bhashini: National Language translation Mission" project of the Ministry of Electronics and Information Technology (MeitY), Government of India.

Whisper Kannada Base

Usage

Training and evaluation data

Training hyperparameters

Acknowledgement

NSDT 3DConvert

UnrealSynth

DreamTexture.js