This model is a merge of LLAMA-13b and SuperCOT LoRA

huggyllama/llama-13b + kaiokendev/SuperCOT-LoRA/13b/gpu/cutoff-2048

CUDA_VISIBLE_DEVICES=0 python llama.py c4 --wbits 4 --true-sequential --act-order --groupsize 128

In ooba make sure to use --groupsize 128 --wbits 4