<img src="https://huggingface.co/deinon-daemon/axolotl-13b-chat-qlora-dev/resolve/main/axolotl_img/a14a0db4-9c8c-4283-8d24-21eb8f6210b1.png" alt="ยก Say hi to Axolotl !" style="float: left; margin-right: 10px;" /> Say hello to axolotl: a small-is-powerful instruct-tuned chat model! This is my second build ever in the fine tuning world. It was hacked in about 48hrs, and was executed entirely on one colab kernel for ~8-9hrs last night (07/29/23) ... enjoy! Test run of Llama-2-13b-chat-hf fine tuned using recently popularized quantized PEFT approach: used Bitsandbytes, --bf16, QLORA, Flash Attn w/ einops and ninja Ampere optimizations, 1 Nvidia A100 GPU for ~9hrs. Fine tuned for 3 epochs on a 40k slice of the Open-Orca dataset, which I postprocessed, added some self-collected contextual qa chat data to, and templated to yield a standard chat instruct prompt format for all examples. Benchmarks at least as good (if not slightly better) than other fine tuned llama/alpaca/guanaco/vicuna models of this scale. The real evaulation/benchmarking is still to come, however, specifically against stabilityai/StableBeluga13B, which seems to be the most popular example of Llama-2 + Open-Orca to date. This is simply a proof of concept (hence the dev tag) -- come back later once we've realeased a model for production.