I implement a method called PEFT (Parameter Efficient Fine-Tuning) for finetuning whisperlarge v2 model on google-fleurs voice data for transcription task.

Training procedure

The following bitsandbytes quantization config was used during training:

Framework versions