This Model is 8bit Version of EleutherAI/gpt-j-6B. It is converted by Facebook's bitsandbytes library. The original GPT-J takes 22+ GB memory for float32 parameters alone, and that's before you account for gradients & optimizer. So for finetuning on single GPU This model is converted into 8bit.

Here's how to run it: Open In Colab The original GPT-J takes 22+ GB memory for float32 parameters alone, and that's before you account for gradients & optimizer. Even if you cast everything to 16-bit, it will still not fit onto most single-GPU setups short of A6000 and A100. You can inference it on TPU or CPUs, but fine-tuning is way more expensive. Here, we apply several techniques to make GPT-J usable and fine-tunable on a single GPU with ~11 GB memory:

How should I fine-tune the model?

We recommend starting with the original hyperparameters from the LoRA paper. On top of that, there is one more trick to consider: the overhead from de-quantizing weights does not depend on batch size. As a result, the larger batch size you can fit, the more efficient you will train.

Can I use this technique with other models?

The model was converted using this notebook. It can be adapted to work with other model types. However, please bear in mind that some models replace Linear and Embedding with custom alternatives that require their own BNBWhateverWithAdapters.