llama model for causal language modeling

Training procedure

The following bitsandbytes quantization config was used during training:

Framework versions