quantized by:

CUDA_VISIBLE_DEVICES=0 python llama.py /root/llava-13b-v1-1 c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors llava-13b-v1-1-4bit-128g.safetensors

using https://github.com/oobabooga/GPTQ-for-LLaMa CUDA branch


license: other